Triaging crash bugs

You can help make Firefox more stable and secure by identifying valid crash bug reports, making sure they have the information they need, and getting them in the hands of the right developers.

Suggested workflow

We want each crash bug to have complete steps to reproduce, a stack trace, and a reduced testcase (if it involves a web page). Bugs where we can get none of this information should be marked as INCOMPLETE. Bugs where we can only get some of this information should also be considered for INCOMPLETE, e.g. after quickly checking with a developer.

Once a bug has confirmed steps, a stack trace, and a reduced testcase if needed, mark the whiteboard with [ccbr]. (This stands for "complete crash bug report"). Jesse will make sure these bugs get to the right developers.

I recommend selecting list of crash bugs and processing every bug on the list, one at a time. For example, you could search for crash bugs reported on your OS that do not have "ccbr" or "needs" or "notacrash" in the whiteboard (Mac example).

When you come across a crash bug that you can't move forward yourself, please add a status whiteboard marker indicating what needs to be done. The whiteboard marker should be specific enough that another person reading it can tell whether they can help. This keeps the right eyes on the right bugs, and prevents misunderstandings where two people are waiting for each other.

A few bugs on your list might not actually be crash bugs.  You can remove them from the list by adding [notacrash] to the whiteboard or by rewording the summary to not include the word "crash".

The goal of processing all crash bugs you see may seem ambitious, but it keeps us from falling into the trap of emergency scanning. When multiple triagers employ emergency scanning, the work of reading bugs is duplicated, and many bugs fall through the cracks.

Complete steps to reproduce

Main article: Bug writing guidelines

If you can reproduce the crash, great! Note your reproduction in the bug and skip the rest of this section.

If you can't reproduce the bug on your computer, and you're using the same OS and Firefox branch as the reporter, chances are the bug does not have complete steps to reproduce. Common missing steps are installing specific extensions, setting preferences, or even having a corrupt file in the Firefox profile. Work with the reporter to figure out what these missing steps are; a good first step is to try Firefox safe mode or a new profile.

If the crash is described as happening "seemingly at random", it's unlikely that you'll figure out steps, although you might be able to find out whether a fresh profile makes the bug go away. Your best bet in this case is showing a developer a stack trace and hoping it is useful. (This is the part where you can be glad the bug is a crash: at least you can get a stack trace!)

If you suspect the bug is unlikely to be a Firefox bug, you can send the reporter to the article about crashes. Let the reporter know that they should file a new bug if the people can help them figure out the cause.

Don't be afraid to ask bug reporters to do difficult things, like making a debug build or reducing a testcase, if doing those things on their computer is the only way to figure out the bug.

Don't be afraid to mark bugs as WORKSFORME or INCOMPLETE. A non-actionable bug isn't helping anyone, and may confuse future bug reporters into thinking the bug has already been (usefully) reported.

Can't figure out or confirm the steps? Unless you think we should give up on figuring out steps, use a whiteboard marker such as:
  • [needs reporter to try in Firefox 3.5]
  • [needs reporter to try a fresh profile]
  • [needs reproduction attempt on Linux]
  • [needs testing by a Wells Fargo account holder]

Stack traces

Main article: How to get a stacktrace for a bug report

Stack traces are useful for getting the bug to the right developer, and sometimes enough for a developer to figure out the bug.

If you can reproduce the crash yourself, you can use about:crashes to get us a stack trace. If only the reporter can reproduce, point them to instructions.

If you have a debug build on Mac or Linux, it can be helpful to get a stack trace from gdb, since this contains more information than a Breakpad crash report. In gdb, type bt to get a normal stack trace or bt full to get one that includes the values of local variables. See Debugging Mozilla with gdb for more information.

If others have commented with breakpad IDs, timeless's tool makes it easy to look at all of them.

Once you have a stack trace, add the crash signature to the summary, in the form [@ nsFoo::Bar]. Then search for other bugs with the same signature in the summary; you might discover that the bug report is a duplicate.

Optionally, you can look at the crash report in more detail:

  • Pick a Bugzilla component based on the functions at the top of the stack.
  • Look at hg blame for the functions near the top of the stack to see which developers touch the code.
  • If the top line of the stack trace is a hex address, and a web page can trigger the crash, it's probably exploitable. Contact the security team for prioritization and possibly making the bug report private.

Optionally, you can search crash-stats for other reports with the same signature:

  • If you discover that there are many reports, you can add the "topcrash" keyword.
  • If you discover that there are no recent reports, that might be a sign that the bug has been fixed since being filed.
  • Comments in crash reports might help with reproduction.
  • For crashes found in nightlies, the graph might tell you approximate when the bug was introduced, giving you a rough regression window.

If you have a stack but no hope of getting steps to reproduce, give the bug to an appropriate developer, with instructions to mark the bug as INCOMPLETE immediately if the stack doesn't provide enough information to fix the bug.

Can't get a stack yourself? Use a whiteboard marker such as:
  • [needs stack from reporter]
  • [needs stack from a Linux user]

Reduced testcases

Main article: Reducing testcases

When a crash is triggered by loading a specific web page, the bug report needs a reduced testcase to be complete.

A reduced testcase makes it much easier for Mozilla developers to understand the bug. It also keeps the bug report useful if the web site changes. Once the bug is fixed, it can become an automated test, guaranteeing the same problem will not surface again.

Once you have a reduced testcase, rewrite the summary to refer to the testcase's ingredients, and try to pick an appropriate Bugzilla component. Then search for duplicates again based on the words you put in (or thought of putting in) the summary.

Identified a bug as needing a reduced testcase, but not up to the task at the moment? Consider attaching a self-contained but non-reduced testcase, and use a whiteboard marker such as:
  • [needs testcase reduction]
  • [needs testcase reduction on Mac]
  • [needs reporter to reduce testcase]
  • [needs andreas to try to making a testcase from scratch]

Regression windows

Main article: Finding a regression window

Determining when the bug was introduced usually also identifies the code responsible for the crash. But it is time consuming, so it isn't worth doing for every bug. Find a regression window if a developer asks for one, or if you think blame will help get the bug fixed.

You'll probably want to start with a binary search among nightly builds. As of 2009, nightly builds on go back to February 2004.

If you want a narrower regression range, you'll need to check out the source code using Mercurial. The hg bisect command can help. Several contributors have written tools to automate the process completely:

Decided that a bug needs a regression window, but not up to the task at the moment? Use these keywords in the keyword field such as:
  • regression, regressionwindow-wanted

Remote debugging

Main article: Remote debugging

When a bug is reproducible by a community member but not by a developer, remote debugging might be preferable to marking the bug as INCOMPLETE. Remote debugging is inefficient, so it should be considered a "last resort" for when it is impossible to identify the relevant difference in setups.

Example whiteboard markers:
  • [needs reporter to build debug and get a core dump]
  • [needs bz and jesse to catch each other on IRC]