Hacking Tips

当你需要深入SpiderMonkey去分析或调试bug时,这里汇总的一些小技巧对你或许会有帮助。本文中所有技巧使用的SpiderMonkey都是按照 SpiderMonkey构建文档(build documentation of SpiderMonkey) 里面的过程编译的。本文分成两个部分,第一部分介绍调试技巧,第二部分介绍如何构造一个优化。


打印帮助信息(JS shell中)


获取函数的字节码(JS shell中)


js> function f () {
  return 1;
js> dis(f);
loc     op
-----   --
00000:  one
00001:  return
00002:  stop

Source notes:
 ofs  line    pc  delta desc     args
---- ---- ----- ------ -------- ------
  0:    1     0 [   0] newline
  1:    2     0 [   0] colspan 2
  3:    2     2 [   2] colspan 9


In jsopcode.cpp, a function named js_DisassembleAtPC can print the bytecode of a script.  Some variants of this function such as js_DumpPc, js_DumpScript and js_DumpScriptDepth are convenient for debugging.


Printing the JS stack. (from gdb)

In jsobj.cpp, a function named js_DumpBacktrace print a backtrace à la gdb for the JS stack.  The backtrace contains in the following order, the stack depth, the interpreter frame pointer (see js/src/vm/Stack.h, StackFrame class) or (nil) if compiled with IonMonkey, the file and line number of the call location and under parentheses, the JSScript pointer and the jsbytecode pointer (pc) executed.

$ gdb --args js
(gdb) b js_ReportOverRecursed
(gdb) r
js> function f(i) {
  if (i % 2) f(i + 1);
  else f(i + 3);
js> f(0)

Breakpoint 1, js_ReportOverRecursed (maybecx=0xfdca70) at /home/nicolas/mozilla/ionmonkey/js/src/jscntxt.cpp:495
495         if (maybecx)
(gdb) call js_DumpBacktrace(maybecx)
#0          (nil)   typein:2 (0x7fffef1231c0 @ 0)
#1          (nil)   typein:2 (0x7fffef1231c0 @ 24)
#2          (nil)   typein:3 (0x7fffef1231c0 @ 47)
#3          (nil)   typein:2 (0x7fffef1231c0 @ 24)
#4          (nil)   typein:3 (0x7fffef1231c0 @ 47)
#25157 0x7fffefbbc250   typein:2 (0x7fffef1231c0 @ 24)
#25158 0x7fffefbbc1c8   typein:3 (0x7fffef1231c0 @ 47)
#25159 0x7fffefbbc140   typein:2 (0x7fffef1231c0 @ 24)
#25160 0x7fffefbbc0b8   typein:3 (0x7fffef1231c0 @ 47)
#25161 0x7fffefbbc030   typein:5 (0x7fffef123280 @ 9)


Setting a breakpoint in the generated code. (from gdb, x86 / x86-64)

To set a breakpoint the generated code of a specific JSScript compiled with IonMonkey (this will also work with JäegerMonkey, except that functions would be different). Set a breakpoint on the instruction you are interested in. If you have no precise idea which function you are looking at, you can set a breakpoint on the js::ion::CodeGenerator::visitStart function.  Optionally, a condition on the ins->id() of the LIR instruction can be added to select precisely the instruction you are looking for. Once the breakpoint is on CodeGenerator function of the LIR instruction, add a command to generate a static breakpoint in the generated code.

$ gdb --args js
(gdb) b js::ion::CodeGenerator::visitStart
(gdb) command
>call masm.breakpoint()
(gdb) r
js> function f(a, b) { return a + b; }
js> for (var  i = 0; i < 100000; i++) f(i, i + 1);

Breakpoint 1, js::ion::CodeGenerator::visitStart (this=0x101ed20, lir=0x10234e0)
    at /home/nicolas/mozilla/ionmonkey/js/src/ion/CodeGenerator.cpp:609
609     }

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7fb165a in ?? ()

Once you hit the generated breakpoint, you can replace it by a gdb breakpoint to make it conditional, the procedure is to first replace the generated breakpoint by a nop instruction, and to set a breakpoint at the address of the nop.

(gdb) x /5i $pc - 1
   0x7ffff7fb1659:      int3   
=> 0x7ffff7fb165a:      mov    0x28(%rsp),%rax
   0x7ffff7fb165f:      mov    %eax,%ecx
   0x7ffff7fb1661:      mov    0x30(%rsp),%rdx
   0x7ffff7fb1666:      mov    %edx,%ebx

(gdb) # replace the int3 by a nop
(gdb) set *(unsigned char *) ($pc - 1) = 0x90
(gdb) x /1i $pc - 1
   0x7ffff7fb1659:      nop

(gdb) # set a breakpoint at the previous location
(gdb) b *0x7ffff7fb1659
Breakpoint 2 at 0x7ffff7fb1659

Finding the script of Ion generated assembly (from gdb)

When facing a bug in which you are in the middle of IonMonkey generated code, first thing to note, is that gdb's backtrace is not reliable, because the generated code does not keep a frame pointer. To figure it out you have to read the stack to infer the IonMonkey frame.

(gdb) x /64a $sp
0x7fffffff9838: 0x7ffff7fad2da  0x141
0x7fffffff9848: 0x7fffef134d40  0x2
(gdb) p (*(JSFunction**) 0x7fffffff9848)->u.i.script_->lineno
$1 = 1
(gdb) p (*(JSFunction**) 0x7fffffff9848)->u.i.script_->filename
$2 = 0xff92d1 "typein"

The stack is order as defined in js/src/ion/IonFrames-x86-shared.h, it is composed of the return address, a descriptor (a small value), the JSFunction (if it is even) or a JSScript (if the it is odd, remove it to dereference the pointer) and the frame ends with the number of actual arguments (a small value too). If you want to know at which LIR the code is failing at, the js::ion::CodeGenerator::generateBody function can be intrumented to dump the LIR id before each instruction.

for (; iter != current->end(); iter++) {
    IonSpew(IonSpew_Codegen, "instruction %s", iter->opName());

    masm.store16(Imm32(iter->id(), Address(StackPointer, -8))); // added
    if (!iter->accept(this))
        return false;


This modification will add an instruction which abuse the stack pointer to store an immediate value (the LIR id) to a location which would never be generated by any sane compiler. Thus when dumping the assembly under gdb, this kind of instructions would be easily noticeable.

Break on valgrind errors

Sometimes, a bug can be reproduced under valgrind but hardly under gdb.  One way to investigate is to let valgrind start gdb for you, the other way documented here is to let valgrind act as a gdb server which can be manipulated from the gdb remote.

$ valgrind --smc-check=all-non-file --vgdb-error=0 ./js …

This command will tell you how to start gdb as a remote. Be aware that functions which are usually dumping some output will do it in the shell where valgrind is started and not in the shell where gdb is started. Thus functions such as js_DumpBacktrace, when called from gdb, will print their output in the shell containing valgrind.

Hacking tips

Using the Gecko Profiler (browser / xpcshell)

see the section dedicated to profiling with the gecko profiler. This method of profiling has the advantage of mixing the JavaScript stack with the C++ stack, which is useful to analyze library function issues.  One tip is to start looking at a script with an inverted JS stack to locate the most expensive JS function, then to focus on the frame of this JS function, and to remove the inverted stack and look at C++ part of this function to determine from where the cost is coming from.

Using the JIT Inspector (browser)

Install the JIT Inspector addon in your browser. This addon provides estimated cost of IonMonkey , JaëgerMonkey and the interpreter. In addition to provides a clean way to analyze if instructions are infered as being monomorphic or polymorphic in addition to the number of time each category of type has been observed.

Using callgrind (JS shell)

As SpiderMonkey just-in-time compiler are rewriting the executed program, valgrind should be informed from the command line by adding --smc-check=all-non-file.

$ valgrind --tool=callgrind --callgrind-out-file=bench.clg --smc-check=all-non-file ./js ./run.js

The output file can then be use with kcachegrind which provides a graphical view of the call graph.

Using IonMonkey spew (JS shell)

IonMonkey spew is extremely verbose (not as much as the INFER spew), but you can filter it to focus on the list of compiled scripts or channels, IonMonkey spew channels can be selected with the IONFLAGS environment variable, and compilation spew can be filtered with IONFILTER.

IONFLAGS contains the names of each channel separated by commas. The logs channel produces 2 files in /tmp/, one (/tmp/ion.json) made to be used with iongraph (made by Sean Stangl) and another one (/tmp/ion.cfg) made to be used with c1visualizer. These tools will show the MIR & LIR steps done by IonMonkey during the compilation.

Compilation logs and spew can be filtered with the IONFILTER environment variable which contains locations as output in other spew channels. Multiple locations can be separated with comma as a separator of locations.

$ IONFILTER=pdfjs.js:16934 IONFLAGS=logs,scripts,osi,bailouts ./js ./run.js 2>&1 | less

The bailouts channel is likely to be the first thing you should focus on, because this means that something does not stay in IonMonkey and fallback to the interpreter. This channel output locations (as returned by the id() function of both instructions) of the lastest MIR and the lastest LIR phases. These locations should correspond to phases of the logs and a filter can be used to remove uninteresting functions.

[Hack] Replacing one instruction.

To replace one specific instruction, you can use in visit function of each instruction the JSScript filename in lineno fields as well as the id() of the LIR / MIR instructions.  The JSScript can be obtained from info().script().

CodeGeneratorX86Shared::visitGuardShape(LGuardShape *guard)
    if (info().script()->lineno == 16934 && guard->id() == 522) {
        [… another impl only for this one …]
        return true;
    [… old impl …]