Mozilla's getting a new look. What do you think? https://mzl.la/brandsurvey

调试 Firefox OS 的内存溢出错误

当 Firefox OS 设备内存溢出时,低内存管理器以及低内存消息系统就会 kill 掉一些进程,从而保持 OS 运行。 当 kernel 选择 kill 掉前台进程时, 就会看到你正在使用的 app 有一个明显的 crash 现象。 本文主要描述了如何理解和调试 OOM crash 问题。

注意: 如果你并不了解内存条件在 Firefox OS 上是如何管理的,请在阅读本文前参考 Out of memory management on Firefox OS

调试  OOM crash

假设你已经复现了你认为是由于手机内存溢出所导致的 crash 问题。下面的步骤可以帮助我们理解这些错误是如何产生的。 

步骤 1: 验证确实是 OOM 

首先,我们需要检查下是否 crash 是否确实是由于内存溢出所导致的。运行命令  adb shell dmesg 可以查看。如果 app 确实是由于 OOM 被 kill 掉的, 你就会看到下面类似的语句输出:

<4>[06-18 07:40:25.291] [2897: Notes+]send sigkill to 2897 (Notes+), adj 2, size 30625

这一行表示手机的低内存管理器 kill 掉了  Notes+ app (进程 id 2897), 当被 kill 时,其 oom_adj 2 。size 是以 pages 为单位的,每个 page 表示 4kb。因此,在此情况下,   Notes+ app 使用了 30625 * 4kb = 120mb 的内存。

题外话: 如果不是 OOM 

如果你在 dmesg 输出中没有看到类似的信息,你的 crash 可能就不是 OOM。下面的步骤就是将  crash 的进程附加在  gdb 上进行调试,并获取 backtrace,就像下面代码所示:

$ cd path/to/B2G/checkout
$ adb shell b2g-ps
# Note pid of the app that you're going to crash
$ ./run-gdb.sh attach <pid>
(gdb) continue
# crash the app
(gdb) bt

当报出 bug 时, 关注上述命令的输出,同时伴随这  adb logcat 的输出。如果你的 crash  是由于 OOM 导致的,   gdb backtrace 就没有什么用了。 因为 OOM crash 是由发送到 kernel 的信号所触发的,而不是进程中错误代码执行时所导致的。

步骤 2: 收集内存报告

在您已经验证您的 crash 确实是由于 OOM 导致的,下面要做的就是在 app crash 前收集关于手机的内存报告。内存报告会帮助我们理解内存都在哪些方面被使用了。 这个步骤有点困难, 因为一旦 app 出现 crash, 就没有办法从那个进程获取到内存报告。当然也没有办法在 kernel 试图去 kill 一个进程时,才来出发内存报告,那就太晚了。

要从手机上获取内存报告, 首先要更新你的构建树,才能获取到最新版本的相关工具。 repo sync 是不能实现这个要求的;你必须执行  git fetch && git mergegit pull 命令才可以:

$ cd path/to/B2G/checkout
$ git fetch origin
$ git merge --ff-only origin

现在你就可以运行使用下面命令来运行内存报告工具了:

$ tools/get_about_memory.py

一旦获取了你所想要的内存报告,就可以将问题的目录(名称为  about-memory-N)压缩,并附加在相关的 bug 上。但是,只有在你运行这些命令时,你所关注的 app 还活着,并且耗费了大量的内存时,才是管用的。此时也有一些选项。

Step 2, 选项 1: 获取不同的设备Get a different device

一般最简单的方式就是使用包含更多 RAM 的设备。从步骤 1 中获知当产生 crashed 时,进程会使用多少内存,因此你可以简单的等待,直到进程使用了那么多的内存时,就获取内存报告。 b2g-info 工具就会显示出不同的 B2G 进程使用了多少内存。你可以使用下面的命令循环的运行这个工具:

$ adb shell 'while true; do b2g-info; sleep 1; done'

如果在设备上无法获取 b2g-info ,你就可以使用  b2g-procrank 来替代。

Step 2, option 2: Fastest finger

如果无法获得包含更多 RAM 的设备,就可已尝试着去在 app crash 前运行  get_about_memory.py 。 你也可以向上节一样循环运行 b2g-info 来计算何时运行  get_about_memory.py. 运行内存报告时会将手机上的所有进程冻结一段时间,因此在进程 OOM 之前抓取内存报告并不困难。

Step 2, option 3: Use a smaller testcase

We often hit OOMs when doing something like "loading a file of at least size X in the app."

If the app crashes very quickly with a testcase of size X, you could try running a similar but smaller testcase (say, size X/2) and capturing a memory report after that succeeds.  The memory report generated this way often gives us good insights into the OOM crash that we ultimately care about.

Step 2, option 4: Run B2G on your desktop

If the worst comes to the worst, you can run B2G on your desktop, which probably has much more RAM than your FxOS phone.  This is tricky because B2G running on a desktop machine is a different in some key ways from B2G running on a phone.

In particular, B2G on desktop machines has multiprocess disabled by default.  It doesn't really work 100% correctly anywhere, but it works most accurately on Linux and Mac.  (Follow Bug 923961, Bug 914584, Bug 891882)  You can test on your desktop without multiprocess enabled, but in my experience a lot of our high memory usage issues are caused by our interprocess communication code, so that won't necessarily trigger the bug you're seeing.

It's also not as convenient to take memory reports from a B2G desktop process.  On Linux, you can send signal 34 to the main B2G process and it'll write memory-report-*.gz files out to /tmp.

One advantage to using B2G desktop builds is that you can use your favorite desktop debugging tools, such as Instruments on Mac OSX.  We've had a lot of success with this in the past. To collect a memory report using Instruments on OS X, choose "New -> Mac OS X -> Allocations". Start b2g-desktop and you should see multiple "plugin-container" processes in the activity monitor. You will need 2 Instruments activities: 1 to trace the allocations on the main b2g process and another to trace the allocations on the app you wish to analyze. Attach the instrument activities and execute your test case.

To analyze how much memory your app is using, analyze call trees. Check the "Invert Call Tree" tick, and sort by bytes used. This will show you which part of your app is using lots of memory. Below is a screenshot of a sample analysis of memory usage for an app:

Screen shot of instruments.

For more information on setting up B2G desktop builds, read our Hacking Gaia page.

步骤 3: 分析内存报告 

当运行 get_about_memory.py 时,它就会打开 Firefox 中的内存报告。这个文件包含系统上所有进程的内存使用信息。  Note that you can hover over any leaf node to get a description of what that node describes. 你真正要寻找的是在  crashing 进程中 ”异常变大“ 的事物。此处的 "unusually large" means by capturing a memory report of your app when it's not using a ton of memory and comparing that to the errant memory report.

阅读内存报告需要一些实践,因此请在需要的时候就发问吧。这个领域的专家在 #memshrink on IRC.

步骤 4: 必要时使用  DMD 重新构建

One common line item to stick out in memory reports captured before apps crash is heap-unclassifiedheap-unclassified counts memory allocated by the process that isn't covered by any other memory reporter.  If you have high heap-unclassified, the memory report can't tell you anything else about what that memory belongs to. Our tool for digging into heap-unclassified is called DMD.  This works on B2G, but you must build B2G yourself in order for it to work because DMD requires local symbols that are only kept on the build machine.

To find out more information on running DMD and interpreting its output, read the Mozilla Wiki DMD page.
 

文档标签和贡献者

 此页面的贡献者: chrisdavidmills, ReyCG_sub
 最后编辑者: chrisdavidmills,