Debugging out of memory errors on Firefox OS

Firefox OS/B2G runs on severely memory-constrained devices, and it's easy for apps to exhaust the memory available on the system.  When a process exhausts the memory available on the system, the kernel must kill some other processes in order to free up memory.  When the kernel chooses to kill the foreground process, this manifests as an apparent crash of the app you're using. This article describes how B2G's multiprocess architecture affects what the phone does when we run out of memory, and how to understand and debug OOM crashes.

Process priorities

B2G uses multiple processes when it runs on a phone — one "main process" and potentially many "child processes".  Every app runs in its own child process, with one exception: The browser app runs in the main process, while the tabs inside the browser app each run in their own child process. The process we kill when we run out of memory isn't necessarily the one that "caused" the out-of-memory condition.  B2G assigns priorities to each process based on how important it thinks the process is, and when the system runs out of memory, it kills process strictly in order of priority.

A process's priority is known as its oom_adj.  Smaller oom_adj values correspond to higher priority processes. Killing the main process kills all child processes and effectively reboots the phone, so we never want to kill the main process.  Therefore, the main process runs with oom_adj 0.

Most child processes run with oom_adj 2 while they're in the foreground.  Child processes in the background run with oom_adj between 3 and 6 (inclusive).  Exactly what oom_adj a child process while in the background gets depends on a number of factors, such as whether it's playing sound, whether it's the homescreen app, and so on.

Debugging an OOM crash

Suppose you have a reproducible crash that you suspect is caused by the phone running out of memory.  The following are steps you can take to understand more about what's going wrong.

Step 1: Verify that it's actually an OOM

First, we need to check whether the crash is actually due to the phone running out of memory.  To do this, run adb shell dmesg.  If the app is being killed due to OOM, you'll see something like the following response:

<4>[06-18 07:40:25.291] [2897: Notes+]send sigkill to 2897 (Notes+), adj 2, size 30625

This line indicates that the phone's low-memory killer killed the Notes+ app (process id 2897), which had oom_adj 2 when it was killed.  The size reported here is in pages, which are 4kb each.  So in this case, the Notes+ app was using 30625 * 4kb = 120mb of memory.

Digression: If it's not an OOM

If you don't see a line like this in the dmesg output, your crash is likely not an OOM.  The next step in debugging such a crash is usually to attach gdb to the crashing process and get a backtrace, which can be done like so:

$ cd path/to/B2G/checkout
$ adb shell b2g-ps
# Note pid of the app that you're going to crash
$ ./ attach <pid>
(gdb) continue
# crash the app
(gdb) bt

When reporting the bug, attach this output, along with the output of adb logcat. If your crash is due to OOM, a gdb backtrace is probably not interesting, because an OOM crash is triggered by a signal sent from the kernel, not by bad code that the process executes.

Step 2: Collect memory reports

After you've verified that your crash is actually due to OOM, the next step is to collect a memory report from the phone before the app crashes.  A memory report will help us understand where memory is being used. This step is a bit tricky because once an app crashes, there's no way to collect a memory report from that process.  There's also no way to trigger a memory report when the kernel tries to kill a process — by then, it's too late.

To pull a memory report from the phone, first update your build tree so you get the latest version of the relevant tool.  repo sync is not sufficient; you must git fetch && git merge or git pull:

$ cd path/to/B2G/checkout
$ git fetch origin
$ git merge --ff-only origin

Now you can run the memory reporting tool like so:

$ tools/

Once you get a memory report you're happy with, you can zip up the directory (named about-memory-N) and attach it to the relevant bug. But again, this is only helpful if you run this command while the app you care about is alive and using a lot of memory.  We have a few options here.

Step 2, option 1: Get a different device

Often the easiest thing to do is to get a device with more RAM.  You know from step 1 above how much memory the process used when it crashed, so you can simply wait until the process is using about that much memory, and then take a memory report. The b2g-info tool shows you how much memory the different B2G processes are using.  You can run this tool in a loop by doing something like the following:

$ adb shell 'while true; do b2g-info; sleep 1; done'

If b2g-info isn't available on your device, you can use b2g-procrank instead.

Step 2, option 2: Fastest finger

If you don't have access to a device with more RAM, you can try to run just before the app crashes.  Again, you can run b2g-info in a loop (as shown in the previous section) to figure out when to run Running a memory report freezes all of the processes on the phone for a few moments, so it's often not difficult to grab a memory report soon before a process OOMs itself.

Step 2, option 3: Use a smaller testcase

We often hit OOMs when doing something like "loading a file of at least size X in the app."

If the app crashes very quickly with a testcase of size X, you could try running a similar but smaller testcase (say, size X/2) and capturing a memory report after that succeeds.  The memory report generated this way often gives us good insights into the OOM crash that we ultimately care about.

Step 2, option 4: Run B2G on your desktop

If the worst comes to the worst, you can run B2G on your desktop, which probably has much more RAM than your FxOS phone.  This is tricky because B2G running on a desktop machine is a different in some key ways from B2G running on a phone.

In particular, B2G on desktop machines has multiprocess disabled by default.  It doesn't really work 100% correctly anywhere, but it works most accurately on Linux and Mac.  (Follow Bug 923961, Bug 914584, Bug 891882)  You can test on your desktop without multiprocess enabled, but in my experience a lot of our high memory usage issues are caused by our interprocess communication code, so that won't necessarily trigger the bug you're seeing.

It's also not as convenient to take memory reports from a B2G desktop process.  On Linux, you can send signal 34 to the main B2G process and it'll write memory-report-*.gz files out to /tmp.

One advantage to using B2G desktop builds is that you can use your favorite desktop debugging tools, such as Instruments on Mac OSX.  We've had a lot of success with this in the past. To collect a memory report using Instruments on OS X, choose "New -> Mac OS X -> Allocations". Start b2g-desktop and you should see multiple "plugin-container" processes in the activity monitor. You will need 2 Instruments activities: 1 to trace the allocations on the main b2g process and another to trace the allocations on the app you wish to analyze. Attach the instrument activities and execute your test case.

To analyze how much memory your app is using, analyze call trees. Check the "Invert Call Tree" tick, and sort by bytes used. This will show you which part of your app is using lots of memory. Below is a screenshot of a sample analysis of memory usage for an app:

Screen shot of instruments.

For more information on setting up B2G desktop builds, read our Hacking Gaia page.

Step 3: Analyze the memory report

When you run, it will open a memory report in Firefox.  This file contains information about the memory usage of all processes on the system. Reading these reports can be a bit overwhelming at first, but it's not so bad once you get the hang of it.  Note that you can hover over any leaf node to get a description of what that node describes. What you're looking for is something "unusually large" in the crashing process.  You can get an idea of what "unusually large" means by capturing a memory report of your app when it's not using a ton of memory and comparing that to the errant memory report.

Reading memory reports takes some practice, so feel free to ask for help.  The experts on this subject hang out in #memshrink on IRC.

Step 4: Rebuild with DMD, if necessary

One common line item to stick out in memory reports captured before apps crash is heap-unclassifiedheap-unclassified counts memory allocated by the process that isn't covered by any other memory reporter.  If you have high heap-unclassified, the memory report can't tell you anything else about what that memory belongs to. Our tool for digging into heap-unclassified is called DMD.  This works on B2G, but you must build B2G yourself in order for it to work because DMD requires local symbols that are only kept on the build machine.

To find out more information on running DMD and interpreting its output, read the Mozilla Wiki DMD page.