Detailed XPCOM hashtable guide

  • Revision slug: Detailed_XPCOM_hashtable_guide
  • Revision title: Detailed XPCOM hashtable guide
  • Revision id: 64373
  • Created:
  • Creator: Tservo
  • Is current revision? No
  • Comment /* PLDHash (JSDHash) */

Revision Content

What Is a Hashtable?

A hashtable is a data construct that stores a set of items. Each item has a key that identifies the item. Items are found, added, and removed from the hashtable by using the key. Hashtables may seem like arrays, but there are important differences:

Array Hashtable
Keys: integer: arrays are always keyed on integers, and must be contiguous. any type: almost any datatype can be used as key, including strings, integers, XPCOM interface pointers, IIDs, and almost anything else. Keys can be disjunct (i.e. you can store entries with keys 1, 5, and 3000).
Lookup Time: O(1): lookup time is a simple constant O(1): lookup time is mostly-constant, but the constant time can be larger than an array lookup
Sorting: sorted: stored sorted; enumerated in a sorted fashion. unsorted: stored unsorted; cannot be enumerated in a sorted manner.
Inserting/Removing: O(n): adding and removing items from a large array can be time-consuming O(1): adding and removing items from hashtables is a quick operation
Wasted space: none: Arrays are packed structures, so there is no wasted space. some: hashtables are not packed structures; depending on on the implementation, there may be significant wasted memory.

In their implementation, hashtables take the key and apply a mathematical hash function to “randomize�? the key and then use the hash to find the location in the hashtable. Good hashtable implementations will automatically resize the hashtable in memory if extra space is needed, or if too much space has been allocated.

When Should I Use a Hashtable?

Hashtables are useful for

  • sets of data that need swift random access;
  • with non-integral keys or non-contiguous integral keys;
  • or where items will be frequently added or removed.

Hashtables should not be used for

  • Sets that need to be sorted;
  • Very small datasets (less than 12-16 items);
  • Data that does not need random access.

In these situations, an array, a linked-list, or various tree data structures are more efficient.

Mozilla's Hashtable Implementations

Mozilla has several hashtable implementations, which have been tested and, tuned, and hide the inner complexities of hashtable implementations:

  • PLDHash - low-level C API; stores keys and data in one large memory structure; uses the heap efficiently; client must declare an "entry class" and may not hold onto entry pointers.
  • PLHashTable - low-level C API; entry class pointers are constant; more efficient for large entry structures; often wastes memory making many small heap allocations.
  • nsTHashtable - low-level C++ wrapper around PLDHash; generates callback functions and handles most casting automagically. Client writes their own entry class which can include complex key and data types.
  • nsDataHashtable/nsInterfaceHashtable/nsClassHashtable - high-level C++ wrappers around PLDHash; simplifies the common usage pattern mapping a simple keytype to a simple datatype; client does not need to declare or manage an entry class; nsDataHashtable datatype is a scalar such as RUint32; nsInterfaceHashtable datatype is an interface; nsClassHashtable datatype is a class pointer owned by the hashtable.

Which Hashtable Should I Use?

Key Type:
integer String/CString nsID nsISupports* Complex
Data Type: None (Hash Set) nsInt32HashSet ns(C)StringHashSet nsTHashtable<...>
Simple (PRUint32) nsDataHashtable nsTHashtable<...>
<nsUInt32HashKey,
PRUint32>
<ns(C)StringHashKey,
PRUint32>
<nsIDHashKey,
PRUint32>
<nsISupportsHashKey,
PRUint32>
Interface (nsISupports) nsInterfaceHashtable
<nsUInt32HashKey,
nsISupports>
<ns(C)StringHashKey,
nsISupports>
<nsIDHashKey,
nsISupports>
<nsISupportsHashKey,
nsISupports>
Class (nsString*) nsClassHashtable
<nsUInt32HashKey,
nsString>
<ns(C)StringHashKey,
nsString>
<nsIDHashKey,
nsString>
<nsISupportsHashKey,
nsString>
Complex
(structures, etc.)
nsTHashtable<...>

PLDHash (JSDHash)

PLDHash and JSDHash are the same thing; one is linked from the XPCOM libraries, the other from the JS libraries. JSDHash is used extensively in the SpiderMonkey JS engine.

The PLDHash implementation is a fairly low-level implementation, written in C. It is extremely flexible, but requires some time to understand and use. A basic guide is included here, but you should read most of xpcom/ds/pldhash.h if you intend to use PLDHash. The C++ wrappers for PLDHash (see below) are often much easier and safer to use in C++ code, as many potential casting errors are easily avoided.

You must declare an entry struct type, deriving from PLDHashEntryHdr. This entry struct should contain whatever data you wish to store in the hashtable (any pointer or fixed-length data type). Note: because of the double-hashing implementation, entries may move in memory when the hashtable is altered. If you need entry pointers to remain constant, you may want to consider using PLHashTable instead.

You must also initialize a PLDHashTableOps structure. This serves similarly to a vtable in C++, with pointers to appropriate user-defined functions that initialize, compare, and match entries. Because PLDHash does not know what datatype your key is, all functions that work with keys are declared using const void*, and your client code must cast these pointers to the appropriate type.

PLDHashTables can be allocated on the stack or the heap:

PLHashTable

nsTHashtable

nsBaseHashtable and friends: nsDataHashtable, nsInterfaceHashtable, and nsClassHashtable

nsHashSets

Future Plans

nsTHashSet

nsClassIHashtable

nsISimpleEnumerator support

Hash Functions

Mozilla's Old/Obsolete/Deprecated/Decrepit Hashtables

nsHashtable

nsObjectHashtable

nsSupportsHashtable

nsDoubleHashtable

Original Document Information

  • Author(s): Benjamin Smedberg <bsmedberg@covad.net>

Revision Source

<p>
</p>
<h2 name="What_Is_a_Hashtable.3F"> What Is a Hashtable? </h2>
<p>A hashtable is a data construct that stores a set of <b>items</b>. Each item has a <b>key</b> that identifies the item. Items are found, added, and removed from the hashtable by using the key. Hashtables may seem like <a href="en/XPCOM/Arrays">arrays</a>, but there are important differences:
</p>
<table class="standard-table">
<tbody><tr>
<th></th>
<th class="header">Array</th>
<th class="header">Hashtable</th>
</tr>
<tr>
<td class="header">Keys:</td>
<td><i>integer:</i> arrays are always keyed on integers, and must be contiguous.</td>
<td><i>any type:</i> almost any datatype can be used as key, including strings, integers, XPCOM interface pointers, IIDs, and almost anything else. Keys can be disjunct (i.e. you can store entries with keys 1, 5, and 3000).</td>
</tr>
<tr>
<td class="header">Lookup Time:</td>
<td><i>O(1):</i> lookup time is a simple constant</td>
<td><i>O(1):</i> lookup time is mostly-constant, but the constant time can be larger than an array lookup</td>
</tr>
<tr>
<td class="header">Sorting:</td>
<td><i>sorted:</i> stored sorted; enumerated in a sorted fashion.</td>
<td><i>unsorted:</i> stored unsorted; cannot be enumerated in a sorted manner.</td>
</tr>
<tr>
<td class="header">Inserting/Removing:</td>
<td><i>O(n):</i> adding and removing items from a large array can be time-consuming</td>
<td><i>O(1):</i> adding and removing items from hashtables is a quick operation</td>
</tr>
<tr>
<td class="header">Wasted space:</td>
<td><i>none:</i> Arrays are packed structures, so there is no wasted space.</td>
<td><i>some:</i> hashtables are not packed structures; depending on on the implementation, there may be significant wasted memory.</td>
</tr>
</tbody></table>
<p>In their implementation, hashtables take the key and apply a mathematical <b>hash function</b> to “randomize�? the key and then use the hash to find the location in the hashtable. Good hashtable implementations will automatically resize the hashtable in memory if extra space is needed, or if too much space has been allocated.
</p>
<h2 name="When_Should_I_Use_a_Hashtable.3F"> When Should I Use a Hashtable? </h2>
<p>Hashtables are useful for
</p>
<ul><li> sets of data that need swift <b>random access</b>;
</li><li> with <b>non-integral keys</b> or <b>non-contiguous integral keys</b>;
</li><li> or where <b>items will be frequently added or removed</b>. 
</li></ul>
<p>Hashtables should <i>not</i> be used for
</p>
<ul><li> Sets that need to be <b>sorted</b>;
</li><li> Very small datasets (less than 12-16 items);
</li><li> Data that does not need random access. 
</li></ul>
<p>In these situations, an array, a linked-list, or various tree data structures are more efficient.
</p>
<h2 name="Mozilla.27s_Hashtable_Implementations"> Mozilla's Hashtable Implementations </h2>
<p>Mozilla has several hashtable implementations, which have been tested and, tuned, and hide the inner complexities of hashtable implementations:
</p>
<ul><li> <code><a href="#PLDHash_.28JSDHash.29">PLDHash</a></code> - low-level C API; stores keys and data in one large memory structure; uses the heap efficiently; client must declare an "entry class" and may not hold onto entry pointers.
</li><li> <code><a href="#PLHashTable">PLHashTable</a></code> - low-level C API; entry class pointers are constant; more efficient for large entry structures; often wastes memory making many small heap allocations.
</li><li> <code><a href="#nsTHashtable">nsTHashtable</a></code> - low-level C++ wrapper around <code>PLDHash</code>; generates callback functions and handles most casting automagically. Client writes their own entry class which can include complex key and data types.
</li><li> <code><a href="#nsBaseHashtable_and_friends:_nsDataHashtable.2C_nsInterfaceHashtable.2C_and_nsClassHashtable">nsDataHashtable/nsInterfaceHashtable/nsClassHashtable</a></code> - high-level C++ wrappers around <code>PLDHash</code>; simplifies the common usage pattern mapping a simple keytype to a simple datatype; client does not need to declare or manage an entry class; <code><b>nsDataHashtable</b></code> datatype is a scalar such as <code>RUint32</code>; <code><b>nsInterfaceHashtable</b></code> datatype is an interface; <code><b>nsClassHashtable</b></code> datatype is a class pointer owned by the hashtable.
</li></ul>
<h3 name="Which_Hashtable_Should_I_Use.3F"> Which Hashtable Should I Use? </h3>
<table class="standard-table">
<tbody><tr>
<th class="header" colspan="2" rowspan="2"></th>
<th class="header" colspan="5">Key Type:</th>
</tr>
<tr>
<th class="header">integer</th>
<th class="header">String/CString</th>
<th class="header">nsID</th>
<th class="header">nsISupports*</th>
<th class="header">Complex</th>
</tr>
<tr>
<td class="header" rowspan="8">Data Type:</td>
<td class="header">None (Hash Set)</td>
<td><code>nsInt32HashSet</code></td>
<td><code>ns(C)StringHashSet</code></td>
<td colspan="3"><code>nsTHashtable&lt;...&gt;</code></td>
</tr>
<tr>
<td class="header" rowspan="2">Simple (PRUint32)</td>
<td colspan="4"><code>nsDataHashtable</code>
</td><td rowspan="6"><code>nsTHashtable&lt;...&gt;</code></td>
</tr>
<tr>
<td><code>&lt;nsUInt32HashKey,<br>PRUint32&gt;</code></td>
<td><code>&lt;ns(C)StringHashKey,<br>PRUint32&gt;</code></td>
<td><code>&lt;nsIDHashKey,<br>PRUint32&gt;</code></td>
<td><code>&lt;nsISupportsHashKey,<br>PRUint32&gt;</code></td>
</tr>
<tr>
<td class="header" rowspan="2">Interface (nsISupports)</td>
<td colspan="4"><code>nsInterfaceHashtable</code>
</td></tr>
<tr>
<td><code>&lt;nsUInt32HashKey,<br>nsISupports&gt;</code></td>
<td><code>&lt;ns(C)StringHashKey,<br>nsISupports&gt;</code></td>
<td><code>&lt;nsIDHashKey,<br>nsISupports&gt;</code></td>
<td><code>&lt;nsISupportsHashKey,<br>nsISupports&gt;</code></td>
</tr>
<tr>
<td class="header" rowspan="2">Class (nsString*)</td>
<td colspan="4"><code>nsClassHashtable</code></td>
</tr>
<tr>
<td><code>&lt;nsUInt32HashKey,<br>nsString&gt;</code></td>
<td><code>&lt;ns(C)StringHashKey,<br>nsString&gt;</code></td>
<td><code>&lt;nsIDHashKey,<br>nsString&gt;</code></td>
<td><code>&lt;nsISupportsHashKey,<br>nsString&gt;</code></td>
</tr>
<tr>
<td class="header">Complex<br>(structures, etc.)</td>
<td colspan="5"><code>nsTHashtable&lt;...&gt;</code></td>
</tr>
</tbody></table>
<h3 name="PLDHash_.28JSDHash.29"> PLDHash (JSDHash) </h3>
<p><code>PLDHash</code> and <code>JSDHash</code> are the same thing; one is linked from the XPCOM libraries, the other from the JS libraries. <code>JSDHash</code> is used extensively in the SpiderMonkey JS engine.
</p><p>The <code>PLDHash</code> implementation is a fairly low-level implementation, written in C. It is extremely flexible, but requires some time to understand and use. A basic guide is included here, but you should read most of <a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h">xpcom/ds/pldhash.h</a> if you intend to use <code>PLDHash</code>. The C++ wrappers for <code>PLDHash</code> (see below) are often much easier and safer to use in C++ code, as many potential casting errors are easily avoided.
</p><p>You must declare an <b>entry struct</b> type, deriving from <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#88">PLDHashEntryHdr</a></code>. This entry struct should contain whatever data you wish to store in the hashtable (any pointer or fixed-length data type). <b>Note:</b> because of the double-hashing implementation, entries may move in memory when the hashtable is altered. If you need entry pointers to remain constant, you may want to consider using <code><a href="#PLHashTable">PLHashTable</a></code> instead.
</p><p>You must also initialize a <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#330">PLDHashTableOps</a></code> structure. This serves similarly to a vtable in C++, with pointers to appropriate user-defined functions that initialize, compare, and match entries. Because <code>PLDHash</code> does not know what datatype your key is, all functions that work with keys are declared using <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#82">const void*</a></code>, and your client code must cast these pointers to the appropriate type.
</p><p>PLDHashTables can be allocated on the stack or the heap:
</p>
<ul><li> When allocated on the stack, or as a C++ class member, the table must be initialized using <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#417">PL_DHashTableInit</a></code>, and finalized using <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#449">PL_DHashTableFinish</a></code>;
</li><li> When allocated on the heap, use <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#400">PL_NewDHashTable</a></code> and <code><a class="external" href="http://lxr.mozilla.org/mozilla/source/xpcom/ds/pldhash.h#408">PL_DHashTableDestroy</a></code> to allocate and delete the table.
</li></ul>
<h3 name="PLHashTable"> PLHashTable </h3>
<h3 name="nsTHashtable"> nsTHashtable </h3>
<h3 name="nsBaseHashtable_and_friends:_nsDataHashtable.2C_nsInterfaceHashtable.2C_and_nsClassHashtable"> nsBaseHashtable and friends: nsDataHashtable, nsInterfaceHashtable, and nsClassHashtable </h3>
<h3 name="nsHashSets"> nsHashSets </h3>
<h2 name="Future_Plans"> Future Plans </h2>
<h3 name="nsTHashSet"> nsTHashSet </h3>
<h3 name="nsClassIHashtable"> nsClassIHashtable </h3>
<h3 name="nsISimpleEnumerator_support"> nsISimpleEnumerator support </h3>
<h2 name="Hash_Functions"> Hash Functions </h2>
<h2 name="Mozilla.27s_Old.2FObsolete.2FDeprecated.2FDecrepit_Hashtables"> Mozilla's Old/Obsolete/Deprecated/Decrepit Hashtables </h2>
<h3 name="nsHashtable"> nsHashtable </h3>
<h3 name="nsObjectHashtable"> nsObjectHashtable </h3>
<h3 name="nsSupportsHashtable"> nsSupportsHashtable </h3>
<h3 name="nsDoubleHashtable"> nsDoubleHashtable </h3>
<div class="originaldocinfo">
<h2 name="Original_Document_Information"> Original Document Information </h2>
<ul><li> Author(s): Benjamin Smedberg &lt;bsmedberg@covad.net&gt;
</li></ul>
</div>
Revert to this revision