Mozilla internal string guide

  • Revision slug: Mozilla_internal_string_guide
  • Revision title: Mozilla internal string guide
  • Revision id: 63361
  • Created:
  • Creator: Tservo
  • Is current revision? No
  • Comment /* Read-only strings */

Revision Content

Introduction

The string classes are a library of C++ classes which are used to manage buffers of unicode and single-byte character strings. They reside in the mozilla codebase in the xpcom/string directory.

Abstract (interface) classes begin with "nsA" and concrete classes simply begin with "ns". Classes with a "CString" in the name store 8-bit bytes (char's) which may refer to single byte ASCII strings, or multibyte Unicode strings encoded in UTF-8 or a (multibyte or single byte) legacy character encoding (e.g. ISO-8859-1, Shift_JIS, GB2312, KOI8-R). All other classes simply have "String" in their name and refer to 16-bit strings made up of PRUnichar's, For example: nsAString is an abstract class for storing Unicode characters in UTF-16 encoding, and nsDependentCString is a concrete class which stores a 8-bit string. Every 16-bit string class has an equivalent 8-bit string class. For example: nsCString is the 8-bit string class which corresponds to nsString.

8-bit and 16-bit string classes have completely separate base classes, but share the same APIs. As a result, you cannot assign a 8-bit string to a 16-bit string without some kind of conversion helper class or routine. For the purpose of this document, we will refer to the 16-bit string classes in class documentation. It is safe to assume that every 16-bit class has an equivalent 8-bit class.

String Guidelines

Follow these simple rules in your code to keep your fellow developers, reviewers, and users happy.

The Abstract Classes

Every string class dervives from nsAString (or nsACString). This class provides the fundamental interface for access and manipulation of strings. While concrete classes derive from nsAString, nsAString itself cannot be instantiated.

This is very similar to the idea of an "interface" that mozilla uses to describe abstract object descriptions in the rest of the codebase. In the case of interfaces, class names begin with "nsI" where "I" refers to "Interface". In the case of strings, abstract classes begin with "nsA" and the "A" means "Abstract".

There are a number of abstract classes which derive from nsAString. These abstract subclasses also cannot be instantiated, but they describe a string in slightly more detail than nsAString. They guarantee that the underlying implementation behind the abstract class provides specific capabilities above and beyond nsAString.

The list below describes the main base classes. Once you are familiar with them, see the appendix describing What Class to Use When.

  • nsAString: the abstract base class for all strings. It provides an API for assignment, individual character access, basic manipulation of characters in the string, and string comparison. This class corresponds to the XPIDL AString parameter type.
  • nsSubstring: the common base class for all of the string classes. Provides optimized access to data within the string. A nsSubstring is not necessarily null-terminated. (For backwards compatibility, nsASingleFragmentString is a typedef for this string class.)
  • nsString: builds on nsSubstring by guaranteeing a null-terminated storage. This allows for a method (.get()) to access the underlying character buffer. (For backwards compatibility, nsAFlatString is a typedef for this string class.)

The remainder of the string classes inherit from either nsSubstring or nsString. Thus, every string class is compatible with nsAString.

It's important to note that nsSubstring and nsAString both represent a contiguous array of characters that are not necessarily null-terminated. One might ask then ask why two different yet similar string classes need to exist. Well, nsSubstring exists primarily as an optimization since nsAString must retain binary compatibility with the frozen nsAString class that shipped with Mozilla 1.0. Up until the release of Mozilla 1.7, nsAString was capable of representing a string broken into multiple fragments. The cost associated with supporting multi-fragment strings was high and offered limited benefits. It was decided to eliminate support for multi-fragment strings in an effort to reduce the complexity of the string classes and improve performance. See bug 231995 for more details.

Though nsSubstring provides a more efficient interface to its underlying buffer than nsAString, nsAString is still the most commonly used class for parameter passing. This is because it is the string class corresponding to AString in XPIDL. Therefore, this string guide will continue to discuss the string classes with an emphasis on nsAString.

Since every string derives from nsAString (or nsACString), they all share a simple API. Common read-only methods:

  • .Length() - the number of code units (bytes for 8-bit string classes and PRUnichar's for 16-bit string classes) in the string.
  • .IsEmpty() - the fastest way of determining if the string has any value. Use this instead of testing string.Length == 0
  • .Equals(string) - TRUE if the given string has the same value as the current string.

Common methods that modify the string:

  • .Assign(string) - Assigns a new value to the string.
  • .Append(string) - Appends a value to the string.
  • .Insert(string, position) - Inserts the given string before the code unit at position.
  • .Truncate(length) - shortens the string to the given length.

Complete documentation can be found in the Appendix.

Read-only strings

The const attribute on a string determines if the string is writable. If a string is defined as a const nsAString then the data in the string cannot be manipulated. If one tries to call a non-const method on a const string the compiler will flag this as an error at build time.

For example:

void nsFoo::ReverseCharacters(nsAString& str) {
       ...
      str.Assign(reversedStr); // modifies the string
}

This should not compile, because you're assigning to a const class:

void nsFoo::ReverseCharacters(const nsAString& str) {
       ...
      str.Assign(reversedStr);
}

As function parameters

The Concrete Classes - which classes to use when

Iterators

Looping with iterators - performance issues

Helper Classes and Functions

Searching strings - looking for substrings, characters, etc.

Memory Allocation - how to avoid it, which methods to use

Substrings (string fragments)

Unicode Conversion ns*CString vs. ns*String

Common Patterns

Callee-allocated Parameters

Literal Strings

String Concatenation

Local variables

Member variables

Raw Character Pointers

IDL

IDL String types

C++ Signatures

Appendix A - What class to use when

Appendix B - nsAString Reference

Revision Source

<p>
</p>
<h2 name="Introduction"> Introduction </h2>
<p>The string classes are a library of C++ classes which are used to manage buffers of unicode and single-byte character strings. They reside in the mozilla codebase in the xpcom/string  directory.
</p><p>Abstract (interface) classes begin with "nsA" and concrete classes simply begin with "ns". Classes with a "CString" in the name store 8-bit bytes (char's) which may refer to single byte ASCII strings, or multibyte Unicode strings encoded in UTF-8 or a (multibyte or single byte) legacy character encoding (e.g. ISO-8859-1, Shift_JIS, GB2312, KOI8-R). All other classes simply have "String" in their name and refer to 16-bit strings made up of PRUnichar's, For example: nsAString is an abstract class for storing Unicode characters in UTF-16 encoding, and nsDependentCString is a concrete class which stores a 8-bit string. Every 16-bit string class has an equivalent 8-bit string class. For example: nsCString is the 8-bit string class which corresponds to nsString.
</p><p>8-bit and 16-bit string classes have completely separate base classes, but share the same APIs. As a result, you cannot assign a 8-bit string to a 16-bit string without some kind of conversion helper class or routine. For the purpose of this document, we will refer to the 16-bit string classes in class documentation. It is safe to assume that every 16-bit class has an equivalent 8-bit class.
</p>
<h2 name="String_Guidelines"> String Guidelines </h2>
<p>Follow these simple rules in your code to keep your fellow developers, reviewers, and users happy.
</p>
<ul><li> Avoid <a href="#Unicode_Conversion_ns.2ACString_vs._ns.2AString">*WithConversion</a> functions at all costs: AssignWithConversion, AppendWithConversion, EqualsWithConversion, etc
</li><li> Use the most abstract string class that you can. Usually this is:
<ul><li> <a href="#The_Abstract_Classes">nsAString</a> for function parameters
</li><li> <a href="#The_Concrete_Classes_-_which_classes_to_use_when">nsString</a> for member variables
</li><li> <a href="#The_Concrete_Classes_-_which_classes_to_use_when">nsAutoString or nsXPIDLString</a> for local (stack-based) variables 
</li></ul>
</li><li> Use <a href="#Literal_Strings">NS_LITERAL_{{mediawiki.external('C')}}STRING / NS_NAMED_LITERAL_{{mediawiki.external('C')}}STRING</a> to represent literal strings (i.e. "foo") as nsAString-compatible objects.
</li><li> Use <a href="#String_Concatenation">string concatenation</a> (i.e. the "+" operator) when combining strings.
</li><li> Use <a href="#Raw_Character_Pointers">nsDependentString</a> when you have a raw character pointer that you need to convert to an nsAString-compatible string.
</li><li> Use <a href="#Substrings_.28string_fragments.29">Substring()</a> to extract fragments of existing strings.
</li><li> Use <a href="#Iterators">iterators</a> to parse and extract string fragments.
</li></ul>
<h2 name="The_Abstract_Classes"> The Abstract Classes </h2>
<p>Every string class dervives from nsAString (or nsACString). This class provides the fundamental interface for access and manipulation of strings. While concrete classes derive from nsAString, nsAString  itself cannot be instantiated.
</p><p>This is very similar to the idea of an "interface" that mozilla uses to describe abstract object descriptions in the rest of the codebase. In the case of interfaces, class names begin with "nsI" where "I" refers to "Interface". In the case of strings, abstract classes begin with "nsA" and the "A" means "Abstract".
</p><p>There are a number of abstract classes which derive from nsAString. These abstract subclasses also cannot be instantiated, but they describe a string in slightly more detail than nsAString. They guarantee that the underlying implementation behind the abstract class provides specific capabilities above and beyond nsAString.
</p><p>The list below describes the main base classes. Once you are familiar with them, see the appendix describing What Class to Use When.
</p>
<ul><li> <b>nsAString</b>: the abstract base class for all strings. It provides an API for assignment, individual character access, basic manipulation of characters in the string, and string comparison. This class corresponds to the XPIDL AString parameter type.
</li><li> <b>nsSubstring</b>: the common base class for all of the string classes. Provides optimized access to data within the string. A nsSubstring is not necessarily null-terminated. (For backwards compatibility, nsASingleFragmentString is a typedef for this string class.)
</li><li> <b>nsString</b>: builds on nsSubstring by guaranteeing a null-terminated storage. This allows for a method (.get()) to access the underlying character buffer. (For backwards compatibility, nsAFlatString is a typedef for this string class.)
</li></ul>
<p>The remainder of the string classes inherit from either nsSubstring or nsString. Thus, every string class is compatible with nsAString.
</p><p>It's important to note that nsSubstring and nsAString both represent a contiguous array of characters that are not necessarily null-terminated. One might ask then ask why two different yet similar string classes need to exist. Well, nsSubstring exists primarily as an optimization since nsAString must retain binary compatibility with the frozen nsAString class that shipped with Mozilla 1.0. Up until the release of Mozilla 1.7, nsAString was capable of representing a string broken into multiple fragments. The cost associated with supporting multi-fragment strings was high and offered limited benefits. It was decided to eliminate support for multi-fragment strings in an effort to reduce the complexity of the string classes and improve performance. See <a class="external" href="http://bugzilla.mozilla.org/show_bug.cgi?id=231995">bug 231995</a> for more details.
</p><p>Though nsSubstring provides a more efficient interface to its underlying buffer than nsAString, nsAString is still the most commonly used class for parameter passing. This is because it is the string class corresponding to AString in XPIDL. Therefore, this string guide will continue to discuss the string classes with an emphasis on nsAString.
</p><p>Since every string derives from nsAString (or nsACString), they all share a simple API.
Common read-only methods:
</p>
<ul><li> <b>.Length()</b> - the number of code units (bytes for 8-bit string classes and PRUnichar's for 16-bit string classes) in the string.
</li><li> <b>.IsEmpty()</b> - the fastest way of determining if the string has any value. Use this instead of testing string.Length == 0
</li><li> <b>.Equals(string)</b> - TRUE if the given string has the same value as the current string. 
</li></ul>
<p>Common methods that modify the string:
</p>
<ul><li> <b>.Assign(string)</b> - Assigns a new value to the string.
</li><li> <b>.Append(string)</b> - Appends a value to the string.
</li><li> <b>.Insert(string, position)</b> - Inserts the given string before the code unit at position.
</li><li> <b>.Truncate(length)</b> - shortens the string to the given length. 
</li></ul>
<p>Complete documentation can be found in the <a href="#Appendix_B_-_nsAString_Reference">Appendix</a>. 
</p>
<h3 name="Read-only_strings"> Read-only strings </h3>
<p>The <code>const</code> attribute on a string determines if the string is writable. If a string is defined as a <code>const nsAString</code> then the data in the string cannot be manipulated. If one tries to call a non-<code>const</code> method on a <code>const</code> string the compiler will flag this as an error at build time.
</p><p>For example:
</p>
<pre>void nsFoo::ReverseCharacters(nsAString&amp; str) {
       ...
      str.Assign(reversedStr); // modifies the string
}
</pre>
<p>This should not compile, because you're assigning to a <code>const</code> class:
</p>
<pre>void nsFoo::ReverseCharacters(const nsAString&amp; str) {
       ...
      str.Assign(reversedStr);
}
</pre>
<h3 name="As_function_parameters"> As function parameters </h3>
<h2 name="The_Concrete_Classes_-_which_classes_to_use_when"> The Concrete Classes - which classes to use when </h2>
<h2 name="Iterators"> Iterators </h2>
<h3 name="Looping_with_iterators_-_performance_issues"> Looping with iterators - performance issues </h3>
<h2 name="Helper_Classes_and_Functions"> Helper Classes and Functions </h2>
<h3 name="Searching_strings_-_looking_for_substrings.2C_characters.2C_etc."> Searching strings - looking for substrings, characters, etc. </h3>
<h3 name="Memory_Allocation_-_how_to_avoid_it.2C_which_methods_to_use"> Memory Allocation - how to avoid it, which methods to use </h3>
<h3 name="Substrings_.28string_fragments.29"> Substrings (string fragments) </h3>
<h2 name="Unicode_Conversion_ns.2ACString_vs._ns.2AString"> Unicode Conversion ns*CString vs. ns*String </h2>
<h2 name="Common_Patterns"> Common Patterns </h2>
<h3 name="Callee-allocated_Parameters"> Callee-allocated Parameters </h3>
<h3 name="Literal_Strings"> Literal Strings </h3>
<h3 name="String_Concatenation"> String Concatenation </h3>
<h3 name="Local_variables"> Local variables </h3>
<h3 name="Member_variables"> Member variables </h3>
<h3 name="Raw_Character_Pointers"> Raw Character Pointers </h3>
<h2 name="IDL"> IDL </h2>
<h3 name="IDL_String_types"> IDL String types </h3>
<h3 name="C.2B.2B_Signatures"> C++ Signatures </h3>
<h2 name="Appendix_A_-_What_class_to_use_when"> Appendix A - What class to use when </h2>
<h2 name="Appendix_B_-_nsAString_Reference"> Appendix B - nsAString Reference </h2>
Revert to this revision