GC Rooting Guide

  • Revision slug: SpiderMonkey/GC_Rooting_Guide
  • Revision title: GC Rooting Guide
  • Revision id: 305805
  • Created:
  • Creator: terrence
  • Is current revision? No
  • Comment

Revision Content

Warning: the new stack rooting API is still in heavy development, this should not be taken as final until SpiderMonkey consensus is reached.

Introduction

This guide explains the basics of interacting with SpiderMonkey's GC.  Since SpiderMonkey has a moving GC, it very important that SpiderMonkey know about each and every GCPointer in the system.  The method you should use to keep SpiderMonkey up-to-date with this information will vary depending on where the GCPointer is going to be stored.  GCPointers can be stored as a property of a GCThing, on the CHeap, or in the CStack.  These three storage regions have different lifetime and overhead characteristics and thus require different management strategies for efficient space and CPU utilization.  Since C++ does not reliably distinguish between storage classes, we have to put this burden on the user.

Storing a GCPointer in a GCThing

Storing your GCPointer inside of a GCThing that is already in the LiveSet is the easiest ways to keep a GCThing to the LiveSet.  GCPointers that reside in GCThings fall into one of these cases: storage in a normal property, storage in a reserved slot, or storage with JS_{Get|Set}Private.  Do not use JS_SetPrivate.  If you see a GCPointer stored as a private, please file a bug.  The private field is frequently used to store things that are not GCPointers, so the GC cannot automatically handle this slot.  This means it must be manually traced by the object's owner: this is both fragile and more expensive than using an extra reserved slot, or even just putting a new property on the object.

JS::Class FooClass = {
    "FooPrototype",          /* name */
    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_RESERVED_SLOTS(1), /* flags */
    JS_PropertyStub,         /* addProperty */
    JS_PropertyStub,         /* delProperty */
    ... etc ...
};
JS::Root<JSObject> obj;
JS_NewObject(cx, &FooClass, JS::NullPtr(), JS::NullPtr(), &obj);
JS::Root<Value> v = ....;
JS_SetReservedSlot(obj, 0, v);

FIXME: Value is too small for fully generic pointers....

TODO: example setProperty

Storing a GCPointer on the CHeap

TODO: Tracing and HeapPtr

Storing a GCPointer on the CStack

GCPointers stored on the CStack are special.  Unlike other GCPointers, they are not traced as part of the GC.  Instead, every pointer on the CStack should be considered part of the RootSet.  To that end, we have a special "Rooting" mechanism for stack pointers that is very efficiently able to add and remove GCPointers to the RootedSet.  The downside of this efficiency is that GCPointers must be added to and removed from this RootedSet tracking structure in LIFO order.  For this reason it is highly recommended that this rooting mechanism only be used on the CStack.

JS::Root<T>

Because of the LIFO restriction, using this interface manually is extremely cumbersome.  Instead, SpiderMonkey has a convenient suite of C++ RAII classes to do this for you, intuitively called JS::Root<T>:

JS::Root<JS::Value> v1(JS_GetNaNValue(cx));
JS::Root<JSString> str(JS_ValueToString(cx, v1));
JS::Root<JSObject> obj;
JS::Root<JSFunction> fun;
JS::Root<JSScript> script;
JS::Root<jsid> jsid;

Note 1:  C++ insists that an initializing assignment (e.g. the default constructor followed by operator=) must have a copy constructor available, even if it is not used.  Since it is never correct to implicitly copy a JS::Root (more on this in a second) we have deleted the copy constructor from these classes.  Therefore, we are forced to force you to initialize with constructor syntax.  Sorry.

Note 2: In C++, destructors are required to run in declaration order, nicely maintaining our LIFO property.  However, this does not hold true for temporaries: they are undeclared, after all.  Some compilers maintain LIFO order, some do not.  Incorrect use of JS::Root as a temporary will crash at run time.

JS::Handle<T>

Any single GCThing that enters SpiderMonkey is likely to get passed through a fairly deep call-stack before getting used.  Even though creating a new JS::Root<T> is cheap, it is silly to repeat this work for every new call frame.  To solve this, we use the standard "handle" approach.  A JS::Handle<T> is a pointer to a JS::Root<T>.  A JS::Root<T> will automatically convert to a JS::Handle<T>.  By using JS::Handle<T> in the interface instead of direct GCPointers, we ensure that the GCThing is already rooted on some previous stack frame, freeing us from having to worry about rooting the GCThing for the duration of the call.

bool Hello(JSContext *cx, JS::Handle<JSObject> foo)
{
    return Hello(cx, foo);
}

.....

void OtherFunction(JSContext *cx)
{
    ...
    JS::Root<JSObject> obj(...);
    Hello(cx, obj);
    ...
}

A secondary benefit: since all JS API functions take JS::Handle<T>s, code that does not root CStack GCPointers correctly will be much less likely to compile.

JS::MutableHandle<T>

The purpose of JS::MutableHandle<T> is to allow out-parameters for rooted GCThings.  This class solves two problems:

  1. C++ only allows one level of user-defined implicit type conversion.  When passed to an out-param, &handle would convert correctly to JS::Handle<T>*, but &root is a compile error.
  2. Even though JS::Handle<T> is indirect, it implicitly behaves like the thing it is pointing to.  This makes JS::Handle<T> and JS::Root<T> behave the same, allowing us to seamlessly weave JS::Handle<T>s into the JS API.  The problem is that this requires us to overload operator->.  If we have a method taking JS::Handle<Value>*, then out->setObject(foo) will not work as expected.

To solve these problems we have the JS::MutableHandle<T> class.  JS::MutableHandle<T> is a JS::Handle<T> that has a set method.

bool ReturnFoo(JSContext *cx, JS::MutableHandle<JSString> out)
{
    out.set(JS_NewStringCopyZ(cx, "Foo"));
    return bool(out);
}

size_t GetLengthFoo(JSContext *cx)
{
    JS::Root<JSString> s;
    if (ReturnFoo(cx, &s))
        return JS_GetStringLength(s);
    return size_t(-1);
}

All methods in the JS-API that return GCPointers have been changed to this out-param style.  This prevents an entire class of bugs where return values are not rooted properly before the next JS-API call.

JS::NullPtr

NULL (or nullptr) does not implicitly convert to a JS::Handle<T>.  For places where you want to pass a NULL value into a function taking a JS::Handle<T>, you can construct and pass a new JS::NullPtr instance.  You are free to use these as temporaries.

JS::Root<JSObject> obj;
bool status = JS_NewObject(cx, clasp, JS::NullPtr(), JS::NullPtr(), &obj);

Common Pitfalls

The C++ type system allows us to elimiate the possibility of most common errors, however, there are still a few things that you can get wrong that the compiler cannot help you with.  There is basically never a good reason to do any of these.  If you think you do need to do one of these, ask on one of SpiderMonkey's support forums: maybe we've already solved your problem using a different mechanism.

  1. Storing a JS::Root<T> on the heap.  It is very easy to end up in violation of the LIFO constraint if you do this.  Use a different rooting mechanism if you store a GCPointer on the heap.
  2. Storing a JS::Handle<T> on the heap.  It is very easy for the handle to outlive its root if you do this.
  3. Returning a JS::Handle<T> from a function.  If you do this, a handle may outlive its root.

Definitions

  • GC - Acronym for Garbage Collection: specifically SpiderMonkey's method of automatically management JavaScript program memory.
  • GCThing - Storage allocated by SpiderMonkey's allocator.
  • GCPointer - A raw pointer type or tagged pointer type that may refer to a GCThing (JSObject *, JSString *, Value, jsid, etc)
  • GCHeap - The graph of GCThings allocated by SpiderMonkey's allocator.  The goal of SpiderMonkey's GC is to partition this heap into regions reachable from the RootSet and not.
  • Root - A GCPointer held live because it is a member of the RootSet.
  • RootSet - This is a set of GCPointers that are de-facto alive, regardless of the GCThing's reachability.  GCPointers must be explicitly added to and removed from the RootSet.
  • LiveSet - The set of all GCThings that are reachable from Roots.
  • DeadSet - The set of all GCThings that are not reachable from Roots.
  • CHeap - The C/C++ program heap: e.g. memory allocated by malloc/calloc/realloc.
  • CStack - The C/C++ program stack.
  • rooted - A GCThing in the LiveSet is said to be "Rooted" even if it is not a Root as such, but is only reachable from Roots.  This traditional usage is ambiguous and this guide will use live or alive instead of rooted, to avoid confusion.

Caveats

Exact Rooting Transition Period

Exact stack rooting is not currently enabled by default: we are still using conservative scanning.  Until the system described above has 100% browser coverage, it is unsafe to enable it.  In the meantime, the new JS::Root class holds an object live by storing its pointer on the stack so that the conservative scanner will find it.  This distinction should not be noticable to the end user.

Performance Tweaking

In aggregate, exact and conservate stack rooting should have similar performance.  The performance tradeoffs that these techniques make, however, are quite different.  If the extra overhead of exact rooting does end up adding an unacceptable cost to a specific code path that is not mitigated by a faster GC, there are some tricks you can use to get better performance at the cost of more complex code.

  • Move JS::Root declarations above loops.  Modern C++ compilers are not smart enough to do LICM on JS::Root, so forward declaring a single JS::Root above the loop and re-using it on every iteration can save some cycles.
  • Raw pointers.  In opt builds the types RawValue/RawObject/RawString/RawId/etc are typedefs to the underlying pointer type.  If you are 100% sure that there is no way for SpiderMonkey to GC while the pointer is on the stack, this is an option.  Note: SpiderMonkey can GC because of any error, GC because of timers, GC because we are low on memory, GC because of environment variables, GC because of cosmic rays, etc.  This is not a terribly safe option for embedder code, so only consider this as a very last resort.

Revision Source

<div class="warning">
  <p id="Warning.3A_the_new_stack_rooting_API_is_still_in_heavy_development.2C_this_should_not_be_taken_as_final_until_SpiderMonkey_concensus_is_reached."><strong>Warning</strong>: the new stack rooting API is still in heavy development, this should not be taken as final until SpiderMonkey consensus is reached.</p>
</div>
<h2 id="Introduction">Introduction</h2>
<p>This guide explains the basics of interacting with SpiderMonkey's <a href="#defineGC" title="#defineGC">GC</a>.&nbsp; Since SpiderMonkey has a moving GC, it very important that SpiderMonkey know about each and every <a href="#defineGCPointer" title="#defineGCPointer">GCPointer</a> in the system.&nbsp; The method you should use to keep SpiderMonkey up-to-date with this information will vary depending on where the GCPointer is going to be stored.&nbsp; GCPointers can be stored as a property of a <a href="#defineGCThing" title="#defineGCThing">GCThing</a>, on the <a href="#defineCHeap" title="#defineCHeap">CHeap</a>, or in the <a href="#defineCStack" title="#defineCStack">CStack</a>.&nbsp; These three storage regions have different lifetime and overhead characteristics and thus require different management strategies for efficient space and CPU utilization.&nbsp; Since C++ does not reliably distinguish between storage classes, we have to put this burden on the user.</p>
<h2 id="Storing_a_GCPointer_in_a_GCThing">Storing a GCPointer in a GCThing</h2>
<p>Storing your GCPointer inside of a GCThing that is already in the <a href="#defineLiveSet" title="#defineLiveSet">LiveSet</a> is the easiest ways to keep a GCThing to the LiveSet.&nbsp; GCPointers that reside in GCThings fall into one of these cases: storage in a normal property, storage in a reserved slot, or storage with JS_{Get|Set}Private.&nbsp; Do not use JS_SetPrivate.&nbsp; If you see a GCPointer stored as a private, please file a bug.&nbsp; The private field is frequently used to store things that are not GCPointers, so the GC cannot automatically handle this slot.&nbsp; This means it must be manually traced by the object's owner: this is both fragile and more expensive than using an extra reserved slot, or even just putting a new property on the object.</p>
<pre>
JS::Class FooClass = {
&nbsp;&nbsp;&nbsp; "FooPrototype",          /* name */
&nbsp;&nbsp;&nbsp; <strike>JSCLASS_HAS_PRIVATE |</strike> JSCLASS_HAS_RESERVED_SLOTS(1), /* flags */
&nbsp;&nbsp;&nbsp; JS_PropertyStub,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /* addProperty */
&nbsp;&nbsp;&nbsp; JS_PropertyStub,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /* delProperty */
    ... etc ...
};
JS::Root&lt;JSObject&gt; obj;
JS_NewObject(cx, &amp;FooClass, JS::NullPtr(), JS::NullPtr(), &amp;obj);
JS::Root&lt;Value&gt; v = ....;
JS_SetReservedSlot(obj, 0, v);
</pre>
<p>FIXME: Value is too small for fully generic pointers....<br />
  <br />
  TODO: example setProperty</p>
<h2 id="Storing_a_GCPointer_on_the_CHeap">Storing a GCPointer on the CHeap</h2>
<p>TODO: Tracing and HeapPtr</p>
<h2 id="Storing_a_GCPointer_on_the_CStack">Storing a GCPointer on the CStack</h2>
<p>GCPointers stored on the CStack are special.&nbsp; Unlike other GCPointers, they are not traced as part of the GC.&nbsp; Instead, every pointer on the CStack should be considered part of the <a href="#defineRootSet" title="#defineRootSet">RootSet</a>.&nbsp; To that end, we have a special "Rooting" mechanism for stack pointers that is very efficiently able to add and remove GCPointers to the RootedSet.&nbsp; The downside of this efficiency is that GCPointers <em>must</em> be added to and removed from this RootedSet tracking structure in LIFO order.&nbsp; For this reason it is highly recommended that this rooting mechanism only be used on the CStack.</p>
<h3 id="JS.3A.3ARoot&lt;T&gt;">JS::Root&lt;T&gt;</h3>
<p>Because of the LIFO restriction, using this interface manually is extremely cumbersome.&nbsp; Instead, SpiderMonkey has a convenient suite of C++ RAII classes to do this for you, intuitively called <code>JS::Root&lt;T&gt;</code>:</p>
<pre>
<code>JS::</code><code>Root&lt;</code><code>JS::</code><code>Value&gt; v1(JS_GetNaNValue(cx));
</code><code>JS::</code><code>Root&lt;JSString&gt; str(JS_ValueToString(cx, v1));
</code><code>JS::</code><code>Root&lt;JSObject&gt; obj;</code>
<code>JS::</code><code>Root&lt;JSFunction&gt; fun;</code>
<code>JS::</code><code>Root&lt;JSScript&gt; script;
JS::Root&lt;jsid&gt; jsid;
</code></pre>
<div class="warning">
  <p><strong>Note 1:</strong>&nbsp; C++ insists that an initializing assignment (e.g. the default constructor followed by operator=) must have a copy constructor available, even if it is not used.&nbsp; Since it is never correct to implicitly copy a JS::Root (more on this in a second) we have deleted the copy constructor from these classes.&nbsp; Therefore, we are forced to force you to initialize with constructor syntax.&nbsp; Sorry.</p>
  <p><strong>Note 2</strong>: In C++, destructors are required to run in declaration order, nicely maintaining our LIFO property.&nbsp; However, this does not hold true for temporaries: they are undeclared, after all.&nbsp; Some compilers maintain LIFO order, some do not.&nbsp; Incorrect use of <code>JS::Root</code> as a temporary will crash at run time.</p>
</div>
<h3 id="JS.3A.3AHandle&lt;T&gt;">JS::Handle&lt;T&gt;</h3>
<p>Any single GCThing that enters SpiderMonkey is likely to get passed through a fairly deep call-stack before getting used.&nbsp; Even though creating a new <code>JS::Root&lt;T&gt;</code> is cheap, it is silly to repeat this work for every new call frame.&nbsp; To solve this, we use the standard "handle" approach.&nbsp; A <code>JS::</code><code>Handle&lt;T&gt;</code> is a pointer to a <code>JS::</code><code>Root&lt;T&gt;</code>.&nbsp; A <code>JS::Root&lt;T&gt;</code> will automatically convert to a <code>JS::Handle&lt;T&gt;</code>.&nbsp; By using <code>JS::Handle&lt;T&gt;</code> in the interface instead of direct GCPointers, we ensure that the GCThing is already rooted on some previous stack frame, freeing us from having to worry about rooting the GCThing for the duration of the call.</p>
<pre>
bool Hello(JSContext *cx, <code>JS::</code>Handle&lt;JSObject&gt; foo)
{
    return Hello(cx, foo);
}

.....

void OtherFunction(JSContext *cx)
{
    ...
    <code>JS::</code>Root&lt;JSObject&gt; obj(...);
    Hello(cx, obj);
    ...
}</pre>
<p>A secondary benefit: since all JS API functions take <code>JS::</code><code>Handle&lt;T&gt;</code>s, code that does not root CStack GCPointers correctly will be much less likely to compile.</p>
<h3 id="JS.3A.3AMutableHandle&lt;T&gt;">JS::MutableHandle&lt;T&gt;</h3>
<p>The purpose of <code>JS::MutableHandle&lt;T&gt;</code> is to allow out-parameters for rooted GCThings.&nbsp; This class solves two problems:</p>
<ol>
  <li>C++ only allows one level of user-defined implicit type conversion.&nbsp; When passed to an out-param, <code>&amp;handle</code> would convert correctly to <code>JS::Handle&lt;T&gt;*</code>, but <code>&amp;root</code> is a compile error.</li>
  <li>Even though <code>JS::</code><code>Handle&lt;T&gt;</code> is indirect, it implicitly behaves like the thing it is pointing to.&nbsp; This makes <code>JS::Handle&lt;T&gt;</code> and <code>JS::Root&lt;T&gt;</code> behave the same, allowing us to seamlessly weave <code>JS::Handle&lt;T&gt;</code>s into the JS API.&nbsp; The problem is that this requires us to overload <code>operator-&gt;</code>.&nbsp; If we have a method taking <code>JS::Handle&lt;Value&gt;*</code>, then <code>out-&gt;setObject(foo)</code> will not work as expected.</li>
</ol>
<p>To solve these problems we have the <code>JS::</code><code>MutableHandle&lt;T&gt;</code> class.&nbsp; <code>JS::</code><code>MutableHandle&lt;T&gt;</code> is a <code>JS::</code><code>Handle&lt;T&gt;</code> that has a <code>set</code> method.</p>
<pre>
bool ReturnFoo(JSContext *cx, JS::MutableHandle&lt;JSString&gt; out)
{
    out.set(JS_NewStringCopyZ(cx, "Foo"));
    return bool(out);
}

size_t GetLengthFoo(JSContext *cx)
{
    JS::Root&lt;JSString&gt; s;
    if (ReturnFoo(cx, &amp;s))
        return JS_GetStringLength(s);
    return size_t(-1);
}
</pre>
<p>All methods in the JS-API that return GCPointers have been changed to this out-param style.&nbsp; This prevents an entire class of bugs where return values are not rooted properly before the next JS-API call.</p>
<h3 id="JS.3A.3ANullPtr">JS::NullPtr</h3>
<p>NULL (or nullptr) does not implicitly convert to a <code>JS::Handle&lt;T&gt;</code>.&nbsp; For places where you want to pass a <code>NULL</code> value into a function taking a <code>JS::Handle&lt;T&gt;</code>, you can construct and pass a new <code>JS::NullPtr</code> instance.&nbsp; You are free to use these as temporaries.</p>
<pre>
JS::Root&lt;JSObject&gt; obj;
bool status = JS_NewObject(cx, clasp, JS::NullPtr(), JS::NullPtr(), &amp;obj);
</pre>
<h2 id="Common_Pitfalls">Common Pitfalls</h2>
<p>The C++ type system allows us to elimiate the possibility of most common errors, however, there are still a few things that you can get wrong that the compiler cannot help you with.&nbsp; There is basically never a good reason to do any of these.&nbsp; If you think you do need to do one of these, ask on one of SpiderMonkey's support forums: maybe we've already solved your problem using a different mechanism.</p>
<ol>
  <li><strong>Storing a <code>JS::Root&lt;T&gt;</code> on the heap.</strong>&nbsp; It is very easy to end up in violation of the LIFO constraint if you do this.&nbsp; Use a different rooting mechanism if you store a GCPointer on the heap.</li>
  <li><strong>Storing a&nbsp;<code>JS::Handle&lt;T&gt;</code> on the heap.</strong>&nbsp; It is very easy for the handle to outlive its root if you do this.</li>
  <li><strong>Returning a <code>JS::Handle&lt;T&gt;</code> from a function.</strong>&nbsp; If you do this, a handle may outlive its root.</li>
</ol>
<h2 id="Definitions">Definitions</h2>
<ul>
  <li><a name="defineGC">GC</a> - Acronym for Garbage Collection: specifically SpiderMonkey's method of automatically management JavaScript program memory.</li>
  <li><a name="defineGCThing">GCThing</a> - Storage allocated by SpiderMonkey's allocator.</li>
  <li><a name="defineGCPointer">GCPointer</a> - A raw pointer type or tagged pointer type that may refer to a GCThing (JSObject *, JSString *, Value, jsid, etc)</li>
  <li><a name="defineGCHeap">GCHeap</a> - The graph of GCThings allocated by SpiderMonkey's allocator.&nbsp; The goal of SpiderMonkey's GC is to partition this heap into regions reachable from the RootSet and not.</li>
  <li><a name="defineRoot">Root</a> - A GCPointer held live because it is a member of the RootSet.</li>
  <li><a name="defineRootSet">RootSet</a> - This is a set of GCPointers that are de-facto alive, regardless of the GCThing's reachability.&nbsp; GCPointers must be explicitly added to and removed from the RootSet.</li>
  <li><a name="defineLiveSet">LiveSet</a> - The set of all GCThings that are reachable from Roots.</li>
  <li><a name="defineDeadSet">DeadSet</a> - The set of all GCThings that are <em><strong>not</strong></em> reachable from Roots.</li>
  <li><a name="defineCHeap">CHeap</a> - The C/C++ program heap: e.g. memory allocated by malloc/calloc/realloc.</li>
  <li><a name="defineCStack">CStack</a> - The C/C++ program stack.</li>
  <li>rooted - A GCThing in the LiveSet is said to be "Rooted" even if it is not a Root as such, but is only reachable from Roots.&nbsp; This traditional usage is ambiguous and this guide will use live or alive instead of rooted, to avoid confusion.</li>
</ul>
<h2 id="Caveats">Caveats</h2>
<h3 id="Exact_Rooting_Transition_Period">Exact Rooting Transition Period</h3>
<p>Exact stack rooting is not currently enabled by default: we are still using conservative scanning.&nbsp; Until the system described above has 100% browser coverage, it is unsafe to enable it.&nbsp; In the meantime, the new <code>JS::Root</code> class holds an object live by storing its pointer on the stack so that the conservative scanner will find it.&nbsp; This distinction should not be noticable to the end user.</p>
<h2 id="Performance_Tweaking">Performance Tweaking</h2>
<p>In aggregate, exact and conservate stack rooting should have similar performance.&nbsp; The performance tradeoffs that these techniques make, however, are quite different.&nbsp; If the extra overhead of exact rooting does end up adding an unacceptable cost to a specific code path that is not mitigated by a faster GC, there are some tricks you can use to get better performance at the cost of more complex code.</p>
<ul>
  <li>Move <code>JS::Root</code> declarations above loops.&nbsp; Modern C++ compilers are not smart enough to do LICM on <code>JS::Root</code>, so forward declaring a single <code>JS::Root</code> above the loop and re-using it on every iteration can save some cycles.</li>
  <li>Raw pointers.&nbsp; In opt builds the types <code>RawValue</code>/<code>RawObject</code>/<code>RawString</code>/<code>RawId</code>/etc are typedefs to the underlying pointer type.&nbsp; If you are 100% sure that there is no way for SpiderMonkey to GC while the pointer is on the stack, this is an option.&nbsp; Note: SpiderMonkey can GC because of any error, GC because of timers, GC because we are low on memory, GC because of environment variables, GC because of cosmic rays, etc.&nbsp; This is not a terribly safe option for embedder code, so only consider this as a very last resort.</li>
</ul>
Revert to this revision