Debugging Mozilla with Valgrind

  • Revision slug: Debugging_Mozilla_with_Valgrind
  • Revision title: Debugging Mozilla with Valgrind
  • Revision id: 17935
  • Created:
  • Creator: Julian Seward
  • Is current revision? No
  • Comment 2 words added, 5 words removed

Revision Content

Where can I get Valgrind?

Linux: http://valgrind.org/ or use your distro's package manager

Mac: sudo port install valgrind

You can also get Valgrind trunk from SVN and build it yourself.

Is there a shared memcheck suppression file for known bugs?

Jesse has one somewhere...

What do I do if the JIT crashes on startup?

Pass the parameter --smc-check=all to valgrind for now.

Note: this option makes valgrind run much slower. An alternative solution is to turn both the content and chrome JITs off.

Or build Mozilla with --enable-valgrind (experimental).

How Do I Run A Mochitest Under Valgrind?

make mochitest-plain TEST_PATH=relative/path/test_mything.html EXTRA_TEST_ARGS='--debugger=valgrind --setpref=javascript.options.jit.chrome=false --setpref=javascript.options.jit.content=false'

See the Mochitest docs for more information about running mochitests.

Tips for improving performance and accuracy of Valgrind's Memcheck tool

Running Firefox on Valgrind's Memcheck tool can be a frustratingly slow experience.  But there are things you can do to improve this.  None of the following is by itself a silver bullet, but taken together they do help considerably.  A sample mozconfig file incorporating all the suggestions is shown below.

  1. Use a decent machine.  Valgrind is tremendously memory-intensive, so the single most important factor is having a large level 2 cache.  2MB is a bare minimum, 4MB or more is preferable.  Most mid-to-upper range Intel Core 2s and Core iXs have 4MB or larger L2/L3s and work well.  I'd guess the latest mid-to-upper-range AMDs are also good, although I haven't tried them recently.
  2. Build Firefox with JEMalloc disabled.  Despite considerable efforts, Memcheck does not adequately understand JEMalloc's behaviour, and loses a lot of its error-detection capability if you use it.  Hence it is pretty much mandatory to use the standard system implementation of malloc/free/new/delete if you want sensible results from Memcheck.
  3. Build Firefox with "-g -O".  Don't use a plain "-g" (unoptimized) build.  Checking memory references takes Valgrind a lot of time.  At -O0 (no optimization), gcc does't do much register allocation, so the generated code has many unnecessary memory references which slow Valgrind down.  At -O (that is, -O1) most of those disappear, whilst retaining pretty good stack-unwind-ability, so that Valgrind can still produce sane stack traces.  The difficulties with running optimised code on Memcheck are (1) a somewhat increased risk of false positive uninitialised-value errors, and (2) incomplete or incomprehensible stack traces.  At -O1 neither of these seem significant.  If you want to live dangerously, and you have gcc-4.3 or later, try "-g -O2".  This appears to give reasonable results too.  There may be future improvements to Valgrind to mitigate problem (1) at -O2, and newer gccs appear to give better debug info at high optimisation levels, thereby mitigating (2).
  4. Use 64-bit builds of Fx in preference to 32-bit builds.  64-bit code has more available registers and better calling conventions, both of which reduce the number of memory references Valgrind has to check.
  5. Avoid --smc-check=all unless you really need it (because the JIT or JITted code crashes).  You can avoid it if you build Fx with --enable-valgrind.  In an ideal world we could fold the relevant pieces of magic enabled by --enable-valgrind into the Fx code base, so --smc-check would never be needed.
  6. Don't use --track-origins=yes unless you are hunting down a specific uninitialised-value error.  It pretty much halves the speed of Valgrind.  That said, it is still way faster than tracking down sources of uninitialised data by hand, and so constitutes a net programmer productivity win.
  7. Use Linux rather than MacOS.  Unfortunately, Valgrind has difficulties in with threaded code on MacOS, which sometimes cause it to run far slower than on Linux.  Fixing this in Valgrind will not be simple.  Such difficulties do not occur on Linux.  If your code is single threaded you can of course ignore this point.  One crucial note, if you work on Linux, is that you must disable JEMalloc ("ac_add_options --disable-jemalloc") to get sane results from Memcheck.
  8. Use the latest Valgrind trunk from SVN.  It's easy to download and build.  The trunk sometimes contains optimizations not yet present in formal releases.  At a bare mininum, use the stock 3.5.0 rather than ancient versions (3.2.x, 3.3.x, etc).  Current trunk contains improvements in handling of 64-bit code relative to the released 3.5.0.  It also contains an experimental flag --vex-guest-chase-cond=yes which improves performance of the instrumented code at the cost of making the instrumentation take a little longer, hence is useful for longer-running programs, eg, longer runs of Fx.

Using all these together, on a Core i5 670 (3.46 GHz) running 64-bit Linux, I can surf the web, reading news sites over my morning coffee, whilst running on Memcheck.  The delays are such that it is obvious that Fx is not running natively, but they are small enough that I spend most of my time reading and not much time waiting for Fx.  Here's a recommended mozconfig:

. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/ff-opt
ac_add_options --enable-tests
ac_add_options --enable-optimize="-g -O -freorder-blocks"
ac_add_options --disable-jemalloc
ac_add_options --enable-valgrind
mk_add_options MOZ_MAKE_FLAGS="-j4"

As per comments above, I've been experimenting recently with -O2 rather than "-O -freorder-blocks", for maximum effect.

Per-platform comments, current as of 4 Feb 2010

Linux on X86/AMD64/PPC32/PPC64

These work out of the box, either via the 3.5.0 release or from trunk sources.  The X86 and AMD64 ports are widely used.  We also support PPC32 and PPC64 on Linux.  This can be interesting in that Valgrinding on those platforms throws up the occasional endianness bug which otherwise might have gone unnoticed.

On PPC32 and PPC64, Valgrind observes and honours the icache flush instructions, so you get transparent support for JIT-generated code without having to use --enable-valgrind or --smc-check=all.

For un-released Linux distros (Ubuntu 10.04, Fedora Rawhide, etc) you'll need the trunk sources, since it has some important fixes for the latest gcc (4.5) and glibc (2.11).  Without them you'll be flooded with false errors from Memcheck, and have debuginfo reading problems.

Linux on ARMv7

This is new and experimental, but works quite well -- it is able to run Fx and other large C++ applications.  You need to use trunk sources for this.  ARMv7 is the minimum supported platform, so you'll need a Beagleboard, an N900, or something else containing a Cortex-A8 CPU.  JIT-generated code is transparently supported, as on PPC32/64 (see comments just above).  Currently supported instruction sets are ARMv5 and VFPv1.  It basically works fine for Ubuntu 9.04 on ARMv7.

One important caveat is that -- unlike on other platforms -- stack traces will end in any code not compiled with -g, so you need to compile everything with -g.  This doesn't mean you can't use optimisation, though -- so the standard recommendation of "-g -O" stands.

MacOSX 10.5.x on X86 (32-bit)

Works out of the box as of 3.5.0.  You need run permanently with --dsymutil=yes, otherwise you'll never get any line number info.  See caveats above re performance on heavily threaded code.

MacOSX 10.6 on X86 (32-bit)

Does not really work.  There are unmerged patches which claim to make it work, but we haven't checked through them nor pushed them into the trunk.  If you really want this, please express an interest to Nick Nethercote and Julian Seward.

MacOSX 10.5/6 on X86 (64-bit)

64-bit kind-of works, but really needs attention.  To make it work well would require considerable reworking of one of the underlying pieces of Valgrind's infrastructure, unfortunately.  Again, if you want this, please contact Nick and Julian.

Windows on X86 (32-bit)

Valgrind doesn't support Windows directly.  However, if you want to live right on the leading edge, it is possible to run a debug Win32 build of Fx on Wine (trunk) on Valgrind (trunk), and get sensible results.  Valgrind contains a rudimentary PDB file reader, so you can get source locations in MSVC compiled code.  A couple of bugs in Fx have been filed as a result of preliminary investigations using this setup.  The basic recipe is documented here, but it's all a bit flaky and needs further work.  If you're interested in this please contact Julian.  It's not as scary as it sounds, and I would like to get this to the status of being usable and useful.

Revision Source

<h3 name="Mozilla_crashes_at_startup_when_I_run_it_under_Valgrind._What_can_I_do.3F">Where can I get Valgrind?</h3>
<p>Linux: <a class=" external" href="http://valgrind.org/" title="http://valgrind.org/">http://valgrind.org/</a> or use your distro's package manager</p>
<p>Mac: sudo port install valgrind</p>
<p>You can also <a class="external" href="http://valgrind.org/downloads/repository.html" title="http://valgrind.org/downloads/repository.html"><span class="external">get Valgrind trunk from SVN</span></a> and build it yourself.</p>
<h3 name="Mozilla_crashes_at_startup_when_I_run_it_under_Valgrind._What_can_I_do.3F">Is there a shared memcheck suppression file for known bugs?</h3>
<p>Jesse has one somewhere...</p>
<h3>What do I do if the JIT crashes on startup?</h3>
<p>Pass the parameter <code>--smc-check=all</code> to valgrind for now.</p>
<p><em>Note:</em> this option makes valgrind run much slower. An alternative solution is to turn both the content and chrome JITs off.</p>
<p>Or build Mozilla with <code>--enable-valgrind</code> (experimental).</p>
<h3>How Do I Run A Mochitest Under Valgrind?</h3>
<pre>make mochitest-plain TEST_PATH=relative/path/test_mything.html EXTRA_TEST_ARGS='--debugger=valgrind --setpref=javascript.options.jit.chrome=false --setpref=javascript.options.jit.content=false'
</pre>
<p>See the <a href="/en/Mochitest" title="en/Mochitest">Mochitest</a> docs for more information about running mochitests.</p>
<h3>Tips for improving performance and accuracy of Valgrind's Memcheck tool</h3>
<p>Running Firefox on Valgrind's Memcheck tool can be a frustratingly slow experience.  But there are things you can do to improve this.  None of the following is by itself a silver bullet, but taken together they do help considerably.  A sample mozconfig file incorporating all the suggestions is shown below.</p>
<ol> <li>Use a decent machine.  Valgrind is tremendously memory-intensive, so the single most important factor is having a large level 2 cache.  2MB is a bare minimum, 4MB or more is preferable.  Most mid-to-upper range Intel Core 2s and Core iXs have 4MB or larger L2/L3s and work well.  I'd guess the latest mid-to-upper-range AMDs are also good, although I haven't tried them recently.</li> <li>Build Firefox with JEMalloc disabled.  Despite considerable efforts, Memcheck does not adequately understand JEMalloc's behaviour, and loses a lot of its error-detection capability if you use it.  Hence it is pretty much mandatory to use the standard system implementation of malloc/free/new/delete if you want sensible results from Memcheck.</li> <li>Build Firefox with "-g -O".  Don't use a plain "-g" (unoptimized) build.  Checking memory references takes Valgrind a lot of time.  At -O0 (no optimization), gcc does't do much register allocation, so the generated code has many unnecessary memory references which slow Valgrind down.  At -O (that is, -O1) most of those disappear, whilst retaining pretty good stack-unwind-ability, so that Valgrind can still produce sane stack traces.  The difficulties with running optimised code on Memcheck are (1) a somewhat increased risk of false positive uninitialised-value errors, and (2) incomplete or incomprehensible stack traces.  At -O1 neither of these seem significant.  If you want to live dangerously, and you have gcc-4.3 or later, try "-g -O2".  This appears to give reasonable results too.  There may be future improvements to Valgrind to mitigate problem (1) at -O2, and newer gccs appear to give better debug info at high optimisation levels, thereby mitigating (2).</li> <li>Use 64-bit builds of Fx in preference to 32-bit builds.  64-bit code has more available registers and better calling conventions, both of which reduce the number of memory references Valgrind has to check.</li> <li>Avoid --smc-check=all unless you really need it (because the JIT or JITted code crashes).  You can avoid it if you build Fx with --enable-valgrind.  In an ideal world we could fold the relevant pieces of magic enabled by --enable-valgrind into the Fx code base, so --smc-check would never be needed.</li> <li>Don't use --track-origins=yes unless you are hunting down a specific uninitialised-value error.  It pretty much halves the speed of Valgrind.  That said, it is still way faster than tracking down sources of uninitialised data by hand, and so constitutes a net programmer productivity win.</li> <li>Use Linux rather than MacOS.  Unfortunately, Valgrind has difficulties in with threaded code on MacOS, which sometimes cause it to run far slower than on Linux.  Fixing this in Valgrind will not be simple.  Such difficulties do not occur on Linux.  If your code is single threaded you can of course ignore this point.  One crucial note, if you work on Linux, is that you must disable JEMalloc ("ac_add_options --disable-jemalloc") to get sane results from Memcheck.</li> <li>Use the <a class=" external" href="http://www.valgrind.org/downloads/repository.html" title="http://www.valgrind.org/downloads/repository.html">latest Valgrind trunk</a> from SVN.  It's easy to download and build.  The trunk sometimes contains optimizations not yet present in formal releases.  At a bare mininum, use the stock 3.5.0 rather than ancient versions (3.2.x, 3.3.x, etc).  Current trunk contains improvements in handling of 64-bit code relative to the released 3.5.0.  It also contains an experimental flag --vex-guest-chase-cond=yes which improves performance of the instrumented code at the cost of making the instrumentation take a little longer, hence is useful for longer-running programs, eg, longer runs of Fx.</li>
</ol>
<p>Using all these together, on a Core i5 670 (3.46 GHz) running 64-bit Linux, I can surf the web, reading news sites over my morning coffee, whilst running on Memcheck.  The delays are such that it is obvious that Fx is not running natively, but they are small enough that I spend most of my time reading and not much time waiting for Fx.  Here's a recommended mozconfig:</p>
<p><code><span style="font-family: monospace;">. $topsrcdir/browser/config/mozconfig<br>
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/ff-opt<br>
ac_add_options --enable-tests<br>
ac_add_options --enable-optimize="-g -O -freorder-blocks"<br>
ac_add_options --disable-jemalloc<br>
ac_add_options --enable-valgrind<br>
mk_add_options MOZ_MAKE_FLAGS="-j4"</span></code><span style="font-family: monospace;"><br>
</span></p>
<p>As per comments above, I've been experimenting recently with -O2 rather than "-O -freorder-blocks", for maximum effect.</p>
<h3>Per-platform comments, current as of 4 Feb 2010</h3>
<h4>Linux on X86/AMD64/PPC32/PPC64</h4>
<p style="margin-left: 40px;">These work out of the box, either via the 3.5.0 release or from trunk sources.  The X86 and AMD64 ports are widely used.  We also support PPC32 and PPC64 on Linux.  This can be interesting in that Valgrinding on those platforms throws up the occasional endianness bug which otherwise might have gone unnoticed.</p>
<p style="margin-left: 40px;">On PPC32 and PPC64, Valgrind observes and honours the icache flush instructions, so you get transparent support for JIT-generated code without having to use --enable-valgrind or --smc-check=all.</p>
<p style="margin-left: 40px;">For un-released Linux distros (Ubuntu 10.04, Fedora Rawhide, etc) you'll need the trunk sources, since it has some important fixes for the latest gcc (4.5) and glibc (2.11).  Without them you'll be flooded with false errors from Memcheck, and have debuginfo reading problems.</p>
<h4>Linux on ARMv7</h4>
<p style="margin-left: 40px;">This is new and experimental, but works quite well -- it is able to run Fx and other large C++ applications.  You need to use trunk sources for this.  ARMv7 is the minimum supported platform, so you'll need a Beagleboard, an N900, or something else containing a Cortex-A8 CPU.  JIT-generated code is transparently supported, as on PPC32/64 (see comments just above).  Currently supported instruction sets are ARMv5 and VFPv1.  It basically works fine for Ubuntu 9.04 on ARMv7.</p>
<p style="margin-left: 40px;">One important caveat is that -- unlike on other platforms -- stack traces will end in any code not compiled with -g, so you need to compile everything with -g.  This doesn't mean you can't use optimisation, though -- so the standard recommendation of "-g -O" stands.</p>
<h4>MacOSX 10.5.x on X86 (32-bit)</h4>
<p style="margin-left: 40px;">Works out of the box as of 3.5.0.  You need run permanently with --dsymutil=yes, otherwise you'll never get any line number info.  See caveats above re performance on heavily threaded code.</p>
<h4>MacOSX 10.6 on X86 (32-bit)</h4>
<p style="margin-left: 40px;">Does not really work.  There are unmerged patches which claim to make it work, but we haven't checked through them nor pushed them into the trunk.  If you really want this, please express an interest to Nick Nethercote and Julian Seward.</p>
<h4>MacOSX 10.5/6 on X86 (64-bit)</h4>
<p style="margin-left: 40px;">64-bit kind-of works, but really needs attention.  To make it work well would require considerable reworking of one of the underlying pieces of Valgrind's infrastructure, unfortunately.  Again, if you want this, please contact Nick and Julian.</p>
<h4>Windows on X86 (32-bit)</h4>
<p style="margin-left: 40px;">Valgrind doesn't support Windows directly.  However, if you want to live right on the leading edge, it is possible to run a debug Win32 build of Fx on Wine (trunk) on Valgrind (trunk), and get sensible results.  Valgrind contains a rudimentary PDB file reader, so you can get source locations in MSVC compiled code.  A couple of bugs in Fx have been filed as a result of preliminary investigations using this setup.  The basic recipe is documented <a class=" external" href="http://wiki.winehq.org/Wine_and_Valgrind" title="http://wiki.winehq.org/Wine_and_Valgrind">here</a>, but it's all a bit flaky and needs further work.  If you're interested in this please contact Julian.  It's not as scary as it sounds, and I would like to get this to the status of being usable and useful.</p>
Revert to this revision