Where can I get Valgrind?
Linux: http://valgrind.org/ or use your distro's package manager
Mac: sudo port install valgrind
You can also get Valgrind trunk from SVN and build it yourself.
Is there a shared memcheck suppression file for known bugs?
Jesse has one somewhere...
What do I do if the JIT crashes on startup?
Pass the parameter
--smc-check=all to valgrind for now.
Note: this option makes valgrind run much slower. An alternative solution is to turn both the content and chrome JITs off.
Or build Mozilla with
How Do I Run A Mochitest Under Valgrind?
See the Mochitest docs for more information about running mochitests.
Tips for improving performance and accuracy of Valgrind's Memcheck tool
Running Firefox on Valgrind's Memcheck tool can be a frustratingly slow experience. But there are things you can do to improve this. None of the following is by itself a silver bullet, but taken together they do help considerably. A sample mozconfig file incorporating all the suggestions is shown below.
- Use a decent machine. Valgrind is tremendously memory-intensive, so the single most important factor is having a large level 2 cache. 2MB is a bare minimum, 4MB or more is preferable. Most mid-to-upper range Intel Core 2s and Core iXs have 4MB or larger L2/L3s and work well. I'd guess the latest mid-to-upper-range AMDs are also good, although I haven't tried them recently.
- Build Firefox with JEMalloc disabled. Despite considerable efforts, Memcheck does not adequately understand JEMalloc's behaviour, and loses a lot of its error-detection capability if you use it. Hence it is pretty much mandatory to use the standard system implementation of malloc/free/new/delete if you want sensible results from Memcheck.
- Build Firefox with "-g -O". Don't use a plain "-g" (unoptimized) build. Checking memory references takes Valgrind a lot of time. At -O0 (no optimization), gcc does't do much register allocation, so the generated code has many unnecessary memory references which slow Valgrind down. At -O (that is, -O1) most of those disappear, whilst retaining pretty good stack-unwind-ability, so that Valgrind can still produce sane stack traces. The dangers with running optimised code on Memcheck are (1) a somewhat increased risk of false positive uninitialised-value errors, and (2) incomplete or incomprehensible stack traces. At -O1 neither of these seem significant. If you want to live on the bleeding edge, and you have gcc-4.3 or later, try "-g -O2". This appears to give reasonable results too. There may be future improvements to Valgrind to mitigate problem (1) at -O2, and newer gccs appear to give better debug info at high optimisation levels, thereby mitigating (2).
- Use 64-bit builds of Fx in preference to 32-bit builds. 64-bit code has more available registers and better calling conventions, both of which reduce the number of memory references Valgrind has to check.
- Avoid --smc-check=all unless you really need it (because the JIT or JITted code crashes). You can avoid it if you build Fx with --enable-valgrind. In an ideal world we could fold the relevant pieces of magic enabled by --enable-valgrind into the Fx code base, so --smc-check would never be needed.
- Don't use --track-origins=yes unless you are hunting down a specific uninitialised-value error. It pretty much halves the speed of Valgrind. That said, it is still way faster than tracking down sources of uninitialised data by hand, and so constitutes a net programmer productivity win.
- Use Linux rather than MacOS. Unfortunately, Valgrind has difficulties in with threaded code on MacOS, which sometimes cause it to run far slower than on Linux. Fixing this in Valgrind will not be simple. Such difficulties do not occur on Linux. If your code is single threaded you can of course ignore this point. One crucial note, if you work on Linux, is that you must disable JEMalloc ("ac_add_options --disable-jemalloc") to get sane results from Memcheck.
- Use the latest Valgrind trunk from SVN. It's easy to download and build. The trunk sometimes contains optimizations not yet present in formal releases. Current trunk contains improvements in handling of 64-bit code relative to the released 3.5.0. It also contains an experimental flag --vex-guest-chase-cond=yes which improves performance of the instrumented code at the cost of making the instrumentation take a little longer, hence is useful for longer-running programs, eg, longer runs of Fx.
Using all these together, on a Core i5 670 (3.46 GHz) running 64-bit Linux, I can surf the web, reading news sites over my morning coffee, whilst running on Memcheck. The delays are such that it is obvious that Fx is not running natively, but they are small enough that I spend most of my time reading and not much time waiting for Fx. Here's a recommended mozconfig:
ac_add_options --enable-optimize="-g -O -freorder-blocks"
As per comments above, I've been experimenting recently with -O2 rather than "-O -freorder-blocks", for maximum effect.
Per-platform comments, current as of 4 Feb 2010
Linux on X86/AMD64/PPC32/PPC64
These work out of the box, either via the 3.5.0 release or from trunk sources. The X86 and AMD64 ports are widely used. We also support PPC32 and PPC64 on Linux. This can be interesting in that Valgrinding on those platforms throws up the occasional endianness bug which otherwise might have gone unnoticed.
On PPC32 and PPC64, Valgrind observes and honours the icache flush instructions, so you get transparent support for JIT-generated code without having to use --enable-valgrind or --smc-check=all.
For un-released Linux distros (Ubuntu 10.04, Fedora Rawhide, etc) you'll need the trunk sources, since it has some important fixes for the latest gcc (4.5) and glibc (2.11). Without them you'll be flooded with false errors from Memcheck, and have debuginfo reading problems.
Linux on ARMv7
This is new and experimental, but works quite well -- it is able to run Fx and other large C++ applications. You need to use trunk sources for this. ARMv7 is the minimum supported platform, so you'll need a Beagleboard, an N900, or something else containing a Cortex-A8 CPU. JIT-generated code is transparently supported, as on PPC32/64 (see comments just above). Currently supported instruction sets are ARMv5 and VFPv1. It basically works fine for Ubuntu 9.04 on ARMv7.
One important caveat is that -- unlike on other platforms -- stack traces will end in any code not compiled with -g, so you need to compile everything with -g. This doesn't mean you can't use optimisation, though -- so the standard recommendation of "-g -O" stands.
MacOSX 10.5.x on X86 (32-bit)
Works out of the box as of 3.5.0. You need run permanently with --dsymutil=yes, otherwise you'll never get any line number info. See caveats above re performance on heavily threaded code.
MacOSX 10.6 on X86 (32-bit)
Does not really work. There are unmerged patches which claim to make it work, but we haven't checked through them nor pushed them into the trunk. If you really want this, please express an interest to Nick Nethercote and Julian Seward.
MacOSX 10.5/6 on X86 (64-bit)
64-bit kind-of works, but really needs attention. To make it work well would require considerable reworking of one of the underlying pieces of Valgrind's infrastructure, unfortunately. Again, if you want this, please contact Nick and Julian.
Windows on X86 (32-bit)
Valgrind doesn't support Windows directly. However, if you want to live right on the leading edge, it is possible to run a debug Win32 build of Fx on Wine (trunk) on Valgrind (trunk), and get sensible results. Valgrind contains a rudimentary PDB file reader, so you can get source locations in MSVC compiled code. A couple of bugs in Fx have been filed as a result of preliminary investigations using this setup. The basic recipe is documented here, but it's all a bit flaky and needs further work. If you're interested in this please contact Julian. It's not as scary as it sounds, and I would like to get this to the status of being usable and useful.