Debugging out of memory errors on Firefox OS

When a Firefox OS device runs out of memory, the the low-memory killer and low-memory notifications systems come into play to kill some processes and keep the OS running. When the kernel chooses to kill the foreground process, this manifests as an apparent crash of the app you're using. This article describes how to understand and debug (out of memory) OOM crashes.

Note: If you don't already know how low memory situations are managed on Firefox OS, you are advised to read Out of memory management on Firefox OS before continuing with this document.

Debugging an OOM crash

Suppose you have a reproducible crash that you suspect is caused by the phone running out of memory.  The following are steps you can take to understand more about what's going wrong.

Step 1: Verify that it's actually an OOM

First, we need to check whether the crash is actually due to the phone running out of memory.  To do this, run adb shell dmesg.  If the app is being killed due to OOM, you'll see something like the following response:

<4>[06-18 07:40:25.291] [2897: Notes+]send sigkill to 2897 (Notes+), adj 2, size 30625

This line indicates that the phone's low-memory killer killed the Notes+ app (process id 2897), which had oom_adj 2 when it was killed.  The size reported here is in pages, which are 4kb each.  So in this case, the Notes+ app was using 30625 * 4kb = 120mb of memory.

Digression: If it's not an OOM

If you don't see a line like this in the dmesg output, your crash is likely not an OOM.  The next step in debugging such a crash is usually to attach gdb to the crashing process and get a backtrace, which can be done like so:

$ cd path/to/B2G/checkout
$ adb shell b2g-ps
# Note pid of the app that you're going to crash
$ ./ attach <pid>
(gdb) continue
# crash the app
(gdb) bt

When reporting the bug, attach this output, along with the output of adb logcat. If your crash is due to OOM, a gdb backtrace is probably not interesting, because an OOM crash is triggered by a signal sent from the kernel, not by bad code that the process executes.

Step 2: Collect memory reports

After you've verified that your crash is actually due to OOM, the next step is to collect a memory report from the phone before the app crashes.  A memory report will help us understand where memory is being used. This step is a bit tricky because once an app crashes, there's no way to collect a memory report from that process.  There's also no way to trigger a memory report when the kernel tries to kill a process — by then, it's too late.

To pull a memory report from the phone, first update your build tree so you get the latest version of the relevant tool.  repo sync is not sufficient; you must git fetch && git merge or git pull:

$ cd path/to/B2G/checkout
$ git fetch origin
$ git merge --ff-only origin

Now you can run the memory reporting tool like so:

$ tools/

Once you get a memory report you're happy with, you can zip up the directory (named about-memory-N) and attach it to the relevant bug. But again, this is only helpful if you run this command while the app you care about is alive and using a lot of memory.  We have a few options here.

Step 2, option 1: Get a different device

Often the easiest thing to do is to get a device with more RAM.  You know from step 1 above how much memory the process used when it crashed, so you can simply wait until the process is using about that much memory, and then take a memory report. The b2g-info tool shows you how much memory the different B2G processes are using.  You can run this tool in a loop by doing something like the following:

$ adb shell 'while true; do b2g-info; sleep 1; done'

If b2g-info isn't available on your device, you can use b2g-procrank instead.

Step 2, option 2: Fastest finger

If you don't have access to a device with more RAM, you can try to run just before the app crashes.  Again, you can run b2g-info in a loop (as shown in the previous section) to figure out when to run Running a memory report freezes all of the processes on the phone for a few moments, so it's often not difficult to grab a memory report soon before a process OOMs itself.

Step 2, option 3: Use a smaller testcase

We often hit OOMs when doing something like "loading a file of at least size X in the app."

If the app crashes very quickly with a testcase of size X, you could try running a similar but smaller testcase (say, size X/2) and capturing a memory report after that succeeds.  The memory report generated this way often gives us good insights into the OOM crash that we ultimately care about.

Step 2, option 4: Run B2G on your desktop

If the worst comes to the worst, you can run B2G on your desktop, which probably has much more RAM than your FxOS phone.  This is tricky because B2G running on a desktop machine is a different in some key ways from B2G running on a phone.

In particular, B2G on desktop machines has multiprocess disabled by default.  It doesn't really work 100% correctly anywhere, but it works most accurately on Linux and Mac.  (Follow Bug 923961, Bug 914584, Bug 891882)  You can test on your desktop without multiprocess enabled, but in my experience a lot of our high memory usage issues are caused by our interprocess communication code, so that won't necessarily trigger the bug you're seeing.

It's also not as convenient to take memory reports from a B2G desktop process.  On Linux, you can send signal 34 to the main B2G process and it'll write memory-report-*.gz files out to /tmp.

One advantage to using B2G desktop builds is that you can use your favorite desktop debugging tools, such as Instruments on Mac OSX.  We've had a lot of success with this in the past. To collect a memory report using Instruments on OS X, choose "New -> Mac OS X -> Allocations". Start b2g-desktop and you should see multiple "plugin-container" processes in the activity monitor. You will need 2 Instruments activities: 1 to trace the allocations on the main b2g process and another to trace the allocations on the app you wish to analyze. Attach the instrument activities and execute your test case.

To analyze how much memory your app is using, analyze call trees. Check the "Invert Call Tree" tick, and sort by bytes used. This will show you which part of your app is using lots of memory. Below is a screenshot of a sample analysis of memory usage for an app:

Screen shot of instruments.

For more information on setting up B2G desktop builds, read our Hacking Gaia page.

Step 3: Analyze the memory report

When you run, it will open a memory report in Firefox.  This file contains information about the memory usage of all processes on the system. Reading these reports can be a bit overwhelming at first, but it's not so bad once you get the hang of it.  Note that you can hover over any leaf node to get a description of what that node describes. What you're looking for is something "unusually large" in the crashing process.  You can get an idea of what "unusually large" means by capturing a memory report of your app when it's not using a ton of memory and comparing that to the errant memory report.

Reading memory reports takes some practice, so feel free to ask for help.  The experts on this subject hang out in #memshrink on IRC.

Step 4: Rebuild with DMD, if necessary

One common line item to stick out in memory reports captured before apps crash is heap-unclassifiedheap-unclassified counts memory allocated by the process that isn't covered by any other memory reporter.  If you have high heap-unclassified, the memory report can't tell you anything else about what that memory belongs to. Our tool for digging into heap-unclassified is called DMD.  This works on B2G, but you must build B2G yourself in order for it to work because DMD requires local symbols that are only kept on the build machine.

To find out more information on running DMD and interpreting its output, read the DMD documentation.