Bytecodes

  • Revision slug: SpiderMonkey/Bytecodes
  • Revision title: Bytecodes
  • Revision id: 294988
  • Created:
  • Creator: berkerpeksag
  • Is current revision? Yes
  • Comment

Revision Content

Background

SpiderMonkey bytecodes are the canonical form of code representation that is used in the JavaScript engine. The JavaScript frontend constructs an AST from the source text, then emits stack-based bytecodes from that AST as a part of the JSScript data structure. Bytecodes can reference atoms and objects (typically by array index) which are also contained in the JSScript data structure.

Within the engine, all bytecode execute within a stack frame -- even global (top-level) and eval code has a stack frame associated with it. A frame on the stack has space for JavaScript Values (the tagged value format) in a few different categories. The space for a single JavaScript value is called a "slot", so the categories are:

There are also some slots reserved for dedicated functionality, holding values like this and the callee / return value.

There is always a "Top of Stack" (TOS) that corresponds to the latest value pushed onto the expression stack. All bytecodes implicitly operate in terms of this location.

Bytecode Listing

All opcodes are annotated with a [-popcount, +pushcount] to represent the overall stack-effects their execution.

If any data for an opcode is implicit in the bytecode stream, it is listed out as parameters in a function-like invocation for the opcode; i.e. JSOP_GOTO(uint16_t offset)

  • Argument slots: holds the actual arguments passed to the current frame.
  • Local slots: holds the local variables used by the current code.
  • Expression slots: holds the temporary space that you need to calculate expressions on a stack. For example, in (a + b) + c you would push a, then push b, then add, then push c, then add, which requires a maximum depth of two expression slots.
JSOP_NOP [-0, +0]
A no-operation bytecode. This was historically used to blacklist loops in TraceMonkey (switching a JSOP_LOOPHEAD to a JSOP_NOP) and (FIXME) apparently still has some relevance for the decompiler.
JSOP_PUSH [-0, +1]
Pushes undefined onto the stack.
JSOP_POPV [-1, +0]
Pops the top stack value into the return-value slot for the currently executing frame.
JSOP_ENTERWITH [-1, +1]
Turn the value at TOS into a member of the scope chain. Pops the value, and pushes the new scope-chain object. The fact that the new scope chain object gets pushed is a bit scary, because that's an engine-only internal data structure, and it ends up on the stack with all the user data.
JSOP_LEAVEWITH [-1, +0]
Pops the scope-chain object from JSOP_ENTERWITH off TOS.
JSOP_RETURN [-1, +0, STOPS]
Pops the TOS into the return value slot and returns the currently executing code to its caller.
JSOP_GOTO(int16_t offset) [-0, +0, JUMPS]
Jumps to a 16-bit offset from the current bytecode.
JSOP_IFEQ(int16_t offset) [-1, +0, JUMPS]
Pops a value from TOS, converts it to a boolean, and, if the result is false, jumps to a 16-bit offset from the current bytecode. The idea is that a sequence like JSOP_ZERO; JSOP_ZERO; JSOP_EQ; JSOP_IFEQ; JSOP_RETURN; reads like a nice linear sequence that will execute the return.
JSOP_IFNE(int16_t offset) [-1, +0, JUMPS]
Same as JSOP_IFEQ, but jumps if the result is true.
JSOP_ARGUMENTS [-0, +1]
Pushes the arguments object (corresponding to arguments) for the current frame. Arguments objects are created lazily.
JSOP_SWAP [-0, +0]
Swaps the top two values on the stack. This is useful for things like post- increment/decrement.
JSOP_POPN(uint16_t N) [-N, +0]
Pops the top N values on the stack in a single opcode.

Revision Source

<h2 id="Background">Background</h2>
<p>SpiderMonkey bytecodes are the canonical form of code representation that is used in the JavaScript engine. The JavaScript frontend constructs an AST from the source text, then emits stack-based bytecodes from that AST as a part of the JSScript data structure. Bytecodes can reference atoms and objects (typically by array index) which are also contained in the JSScript data structure.</p>
<p>Within the engine, all bytecode execute within a stack frame -- even global (top-level) and eval code has a stack frame associated with it. A frame on the stack has space for JavaScript Values (the tagged value format) in a few different categories. The space for a single JavaScript value is called a "slot", so the categories are:</p>
<p>There are also some slots reserved for dedicated functionality, holding values like <code>this</code> and the callee / return value.</p>
<p>There is always a "Top of Stack" (TOS) that corresponds to the latest value pushed onto the expression stack. All bytecodes implicitly operate in terms of this location.</p>
<h2 id="Bytecode_Listing">Bytecode Listing</h2>
<p>All opcodes are annotated with a [-popcount, +pushcount] to represent the overall stack-effects their execution.</p>
<p>If any data for an opcode is implicit in the bytecode stream, it is listed out as parameters in a function-like invocation for the opcode; i.e. <code>JSOP_GOTO(uint16_t offset)</code></p>
<ul>
  <li>Argument slots: holds the actual arguments passed to the current frame.</li>
  <li>Local slots: holds the local variables used by the current code.</li>
  <li>Expression slots: holds the temporary space that you need to calculate expressions on a stack. For example, in <code>(a + b) + c</code> you would push a, then push b, then add, then push c, then add, which requires a maximum depth of two expression slots.</li>
</ul>
<dl>
  <dt>
    JSOP_NOP [-0, +0]</dt>
  <dd>
    A no-operation bytecode. This was historically used to blacklist loops in TraceMonkey (switching a JSOP_LOOPHEAD to a JSOP_NOP) and (FIXME) apparently still has some relevance for the decompiler.</dd>
  <dt>
    JSOP_PUSH [-0, +1]</dt>
  <dd>
    Pushes undefined onto the stack.</dd>
  <dt>
    JSOP_POPV [-1, +0]</dt>
  <dd>
    Pops the top stack value into the return-value slot for the currently executing frame.</dd>
  <dt>
    JSOP_ENTERWITH [-1, +1]</dt>
  <dd>
    Turn the value at TOS into a member of the scope chain. Pops the value, and pushes the new scope-chain object. The fact that the new scope chain object gets pushed is a bit scary, because that's an engine-only internal data structure, and it ends up on the stack with all the user data.</dd>
  <dt>
    JSOP_LEAVEWITH [-1, +0]</dt>
  <dd>
    Pops the scope-chain object from JSOP_ENTERWITH off TOS.</dd>
  <dt>
    JSOP_RETURN [-1, +0, STOPS]</dt>
  <dd>
    Pops the TOS into the return value slot and returns the currently executing code to its caller.</dd>
  <dt>
    JSOP_GOTO(int16_t offset) [-0, +0, JUMPS]</dt>
  <dd>
    Jumps to a 16-bit offset from the current bytecode.</dd>
  <dt>
    JSOP_IFEQ(int16_t offset) [-1, +0, JUMPS]</dt>
  <dd>
    Pops a value from TOS, converts it to a boolean, and, if the result is false, jumps to a 16-bit offset from the current bytecode. The idea is that a sequence like JSOP_ZERO; JSOP_ZERO; JSOP_EQ; JSOP_IFEQ; JSOP_RETURN; reads like a nice linear sequence that will execute the return.</dd>
  <dt>
    JSOP_IFNE(int16_t offset) [-1, +0, JUMPS]</dt>
  <dd>
    Same as JSOP_IFEQ, but jumps if the result is true.</dd>
  <dt>
    JSOP_ARGUMENTS [-0, +1]</dt>
  <dd>
    Pushes the arguments object (corresponding to <code>arguments</code>) for the current frame. Arguments objects are created lazily.</dd>
  <dt>
    JSOP_SWAP [-0, +0]</dt>
  <dd>
    Swaps the top two values on the stack. This is useful for things like post- increment/decrement.</dd>
  <dt>
    JSOP_POPN(uint16_t N) [-N, +0]</dt>
  <dd>
    Pops the top N values on the stack in a single opcode.</dd>
</dl>
Revert to this revision