Mozilla internal string guide

  • Revision slug: Mozilla_internal_string_guide
  • Revision title: Mozilla internal string guide
  • Revision id: 63343
  • Created:
  • Creator: Tservo
  • Is current revision? No
  • Comment /* String Guidelines */

Revision Content

Introduction

The string classes are a library of C++ classes which are used to manage buffers of unicode and single-byte character strings. They reside in the mozilla codebase in the xpcom/string directory.

Abstract (interface) classes begin with "nsA" and concrete classes simply begin with "ns". Classes with a "CString" in the name store 8-bit bytes (char's) which may refer to single byte ASCII strings, or multibyte Unicode strings encoded in UTF-8 or a (multibyte or single byte) legacy character encoding (e.g. ISO-8859-1, Shift_JIS, GB2312, KOI8-R). All other classes simply have "String" in their name and refer to 16-bit strings made up of PRUnichar's, For example: nsAString is an abstract class for storing Unicode characters in UTF-16 encoding, and nsDependentCString is a concrete class which stores a 8-bit string. Every 16-bit string class has an equivalent 8-bit string class. For example: nsCString is the 8-bit string class which corresponds to nsString.

8-bit and 16-bit string classes have completely separate base classes, but share the same APIs. As a result, you cannot assign a 8-bit string to a 16-bit string without some kind of conversion helper class or routine. For the purpose of this document, we will refer to the 16-bit string classes in class documentation. It is safe to assume that every 16-bit class has an equivalent 8-bit class.

String Guidelines

Follow these simple rules in your code to keep your fellow developers, reviewers, and users happy.

The Abstract Classes

Read-only strings

As function parameters

The Concrete Classes - which classes to use when

Iterators

Looping with iterators - performance issues

Helper Classes and Functions

Searching strings - looking for substrings, characters, etc.

Memory Allocation - how to avoid it, which methods to use

Substrings (string fragments)

Unicode Conversion ns*CString vs. ns*String

Common Patterns

Callee-allocated Parameters

Literal Strings

String Concatenation

Local variables

Member variables

Raw Character Pointers

IDL

IDL String types

C++ Signatures

Revision Source

<h2 name="Introduction"> Introduction </h2>
<p>The string classes are a library of C++ classes which are used to manage buffers of unicode and single-byte character strings. They reside in the mozilla codebase in the xpcom/string  directory.
</p><p>Abstract (interface) classes begin with "nsA" and concrete classes simply begin with "ns". Classes with a "CString" in the name store 8-bit bytes (char's) which may refer to single byte ASCII strings, or multibyte Unicode strings encoded in UTF-8 or a (multibyte or single byte) legacy character encoding (e.g. ISO-8859-1, Shift_JIS, GB2312, KOI8-R). All other classes simply have "String" in their name and refer to 16-bit strings made up of PRUnichar's, For example: nsAString is an abstract class for storing Unicode characters in UTF-16 encoding, and nsDependentCString is a concrete class which stores a 8-bit string. Every 16-bit string class has an equivalent 8-bit string class. For example: nsCString is the 8-bit string class which corresponds to nsString.
</p><p>8-bit and 16-bit string classes have completely separate base classes, but share the same APIs. As a result, you cannot assign a 8-bit string to a 16-bit string without some kind of conversion helper class or routine. For the purpose of this document, we will refer to the 16-bit string classes in class documentation. It is safe to assume that every 16-bit class has an equivalent 8-bit class.
</p>
<h2 name="String_Guidelines"> String Guidelines </h2>
<p>Follow these simple rules in your code to keep your fellow developers, reviewers, and users happy.
</p>
<ul><li> Avoid <a href="#Unicode_Conversion_ns.2ACString_vs._ns.2AString">*WithConversion</a> functions at all costs: AssignWithConversion, AppendWithConversion, EqualsWithConversion, etc
</li><li> Use the most abstract string class that you can. Usually this is:
<ul><li> <a href="#The_Abstract_Classes">nsAString</a> for function parameters
</li><li> <a href="#The_Concrete_Classes_-_which_classes_to_use_when">nsString</a> for member variables
</li><li> <a href="#The_Concrete_Classes_-_which_classes_to_use_when">nsAutoString or nsXPIDLString</a> for local (stack-based) variables 
</li></ul>
</li><li> Use <a href="#Literal_Strings">NS_LITERAL_{{mediawiki.external('C')}}STRING / NS_NAMED_LITERAL_{{mediawiki.external('C')}}STRING</a> to represent literal strings (i.e. "foo") as nsAString-compatible objects.
</li><li> Use <a href="#String_Concatenation">string concatenation</a> (i.e. the "+" operator) when combining strings.
</li><li> Use <a href="en/Raw_Character_Pointers">nsDependentString</a> when you have a raw character pointer that you need to convert to an nsAString-compatible string.
</li><li> Use Substring() to extract fragments of existing strings.
</li><li> Use iterators to parse and extract string fragments
</li></ul>
<h2 name="The_Abstract_Classes"> The Abstract Classes </h2>
<h3 name="Read-only_strings"> Read-only strings </h3>
<h3 name="As_function_parameters"> As function parameters </h3>
<h2 name="The_Concrete_Classes_-_which_classes_to_use_when"> The Concrete Classes - which classes to use when </h2>
<h2 name="Iterators"> Iterators </h2>
<h3 name="Looping_with_iterators_-_performance_issues"> Looping with iterators - performance issues </h3>
<h2 name="Helper_Classes_and_Functions"> Helper Classes and Functions </h2>
<h3 name="Searching_strings_-_looking_for_substrings.2C_characters.2C_etc."> Searching strings - looking for substrings, characters, etc. </h3>
<h3 name="Memory_Allocation_-_how_to_avoid_it.2C_which_methods_to_use"> Memory Allocation - how to avoid it, which methods to use </h3>
<h3 name="Substrings_.28string_fragments.29"> Substrings (string fragments) </h3>
<h2 name="Unicode_Conversion_ns.2ACString_vs._ns.2AString"> Unicode Conversion ns*CString vs. ns*String </h2>
<h2 name="Common_Patterns"> Common Patterns </h2>
<h3 name="Callee-allocated_Parameters"> Callee-allocated Parameters </h3>
<h3 name="Literal_Strings"> Literal Strings </h3>
<h3 name="String_Concatenation"> String Concatenation </h3>
<h3 name="Local_variables"> Local variables </h3>
<h3 name="Member_variables"> Member variables </h3>
<h3 name="Raw_Character_Pointers"> Raw Character Pointers </h3>
<h2 name="IDL"> IDL </h2>
<h3 name="IDL_String_types"> IDL String types </h3>
<h3 name="C.2B.2B_Signatures"> C++ Signatures </h3>
Revert to this revision