Firefox now has a built-in profiler. Having a profiler in the code base lets us, among other things, better measure responsiveness, charge performance costs more accurately, and run in environments and platforms in which external profilers aren't available, such as in the user's environment or on a locked Android device. This article details how to use the profiler.
Getting the Profiler Add-on
The built-in profiler has two interfaces. For Web developers there is a simplified profiler that can be opened from the menu Tools > Web Developer > Performance. A more advanced interface for developers of Mozilla's internals can be obtained by installing Benoit Girard's Gecko Profiler add-on. Once installed, you should customize the Firefox toolbar (using the "Customize" option in the hamburger menu) and add the Profiler icon to the toolbar.
Using the Add-on
Reporting a Performance Problem has a step-by-step guide for obtaining a profile when requested by Firefox developers.
Reporting a Thunderbird Performance Problem has a step-by-step guide for obtaining a profile when requested by Thunderbird developers.
The profiler uses a fixed size buffer to store a few seconds worth of samples. When it runs out of space it discards old entries. When you stop the profiler it throws away its buffer. When you start the profiler it creates it again and begins to fill it. When a profile is taken, it's sent, for viewing, to a web based application called Cleopatra.
Click the profiler icon in your toolbar to open the profiler panel. To take a profile you can use the buttons in the profiler panel or the keyboard shortcuts.
Ctrl+Shift+1- Start/Stop the profiler
Ctrl+Shift+2- Take a profile and launch Cleopatra to view it
Understanding Cleopatra Profiles
In the results above we can see that we're spending ~113 milliseconds in
cp_handleEvent() when running
BrowserElementPanning.js, line 77. That means 113 ms for
cp_handleEvent and ALL child functions that are called. We spend 56 ms of that in
cp_onTouchStart. We then spend 41 ms in
cp_getPannable, 40 ms of which is spent in
cp_findPannable. Essentially, you're looking for the processes that are taking the most time, then you can figure out how to optimize them.
Understanding Frames In Cleopatra
If you profile the Compositor thread in the main b2g process, you will get a frames view each time a new frame was produced. It will look something like this:
We want to take a look at the Frames view. We see a couple of things:
- Vsync [Orange] - This is really a vsync marker. This is available when the hardware composer on the device is supplying vsync events. Since vsyncs occur at a single point in time, vsyncs are actually a block only for visual readability. The actual vsync time occured at the start of the vsync box. See Project Silk for why this is important.
- Refresh [Green] - This is what's occuring during an nsRefreshDriver::Tick phase.
- DisplayList [Red] - The amount of time taken to create a display list.
- Reflow [Green] - Amount of time spent during reflow.
- Composite [Purple] - Amount of time spent performing a composite.
- LayerTranscaction - Amount of time spent performing a layer transaction.
Profiling Boot to Gecko (with a real device)
There is a script called
- You need to have a local build of B2G. Make sure you build with
export B2G_PROFILING=1in your
.userconfigfile (See Customizing your .userconfig). This will not work with prebuilt binaries. Note: if you have a debug build (
.userconfig), there is no need to additionally export
- You need to have your phone plugged into your PC and have it accessible via ADB.
The general steps for using the profiler are:
- Start the app you want to profile and perform all the steps required to get you up to the point just before the slow action that you want to investigate.
- Start the profiler.
- Perform all the actions you want to investigate on the phone.
- Capture the profile and stop the profiler.
- Upload and Share the profile to Cleopatra.
Start the Profiler
TL;DR: You should read the below instructions for more details, but your general command structure for profiling rendering will be
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p YOUR APP NAME HERE, for example
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings. You can find the list of running apps with
profile.sh ps. Run
./profile.sh capture after performing the actions you want to profile. You have to profile the B2G app at all times.
Starting the profiler is done seperatly for each process. The general guideline is to start the profiler on the B2G parent process' compositor thread (effectively the window manager) and the app you want to profile. For example,
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings will start the profiler to profile the B2G compositor and start the profiler on the Settings app. If you start the profiler with this command, you should get an output like this:
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings Process: b2g Using default features js,leaf Starting profiling PID 500.. Profiler started Process: Settings Using default features js,leaf Starting profiling PID 611.. Profiler started
./profile.sh start, without any arguments, will reset the device and start the profiler on all processes once the phone reboots. This mode is deprecated.
If you don't know the name of your app or you get an error saying a process doesn't exist, try running
./profile.sh ps. You should get something like this:
./profile.sh ps PID Name ----- ---------------- 845 b2g profiler not running 893 (Nuwa) profiler not running 909 Homescreen profiler not running 937 Messages profiler not running 1024 (Preallocated a profiler not running
Note: Process names are CASE SENSITIVE.
./profile.sh startStarts the profiler on a specific process / thread. For example, profiling the 'Compositor' thread on the B2G process is useful for profiling scrolling/drawing. The associated flags are:
-p: Which process to profile (b2g, Email, etc.).
-e: The number of profile entries to capture, which details how much profile data the profiler should keep. This is a circular buffer.
-s: The stack scan mode, as detailed above.
-i: The sampling interval to the specified number of milliseconds.
-m: The profiler mode.
-t: Which threads to profile. To specify multiple threads, use a single argument with comma-separated names, e.g.
-t Compositor,GeckoMain. (Do not specify a second
-targument, it will just override the first.) Note that the thread name of the Gecko main thread is
./profile.sh pswill show running B2G processes and whether the profiler was enabled for those processes or not.
./profile.sh capture [pid or name]will initiate a capture. If you don't specify any arguments, then all currently running B2G processes will be captured. Otherwise the B2G process with the indicated pid or name will be captured. The profile script will pull the profile files from the phone, and add symbols.
- The profile script uses some variables from the file
.var.profile, which is generated by the build. These will allow the script to locate your
objdir-geckotree, the appropriate toolchain, and the
out/target/product/<phone>tree to get symbols for the Android libraries.
.txtfiles will be renamed when pulled to the host and will have the following pattern:
.sym) If your capture includes multiple processes then they'll all have the same HHMM portion. The PID will be the PID of the process, and NAME will be the app name (as per the
.symfiles (which are the
.txtfiles with symbols added) can then be uploaded to the Cleopatra UI.
./profile.sh stopwill kill the currently running b2g and restart it normally (i.e. profiling disabled).
Some extra commands available in the profile script (these will not be needed normally, but can be useful if you're working on the script):
./profile.sh lswill show all of the profile files stored on the phone (it looks in the
./profile.sh pswill show all Gecko processes and if they are being profiled.
./profile.sh signal [pid]triggers the profiler to store the current profile buffers to files on the phone.
./profile.sh pull pid [NAME [HHMM]]will pull the profile file for the indicated pid and rename it as mentioned above.
./profile.sh symbolicate filenamewill take the
profile_HHMM_PID_NAME.txtfile and create
profile_HHMM_PID_NAME.sym, which has symbols in it.
./profile.sh helpwill print out all of the commands currently supported by the script.
Other Ways to Profile B2G:
Capture the Profile
Once you have finished all the actions under investigation, you need to capture the profile. You can capture the profile by running
./profile.sh capture. Your output should be something like this:
./profile.sh capture Signaling Profiled Processes: 500 611 Stabilizing 500 b2g ... Pulling /data/local/tmp/profile_0_500.txt into profile_500_b2g.txt Adding symbols to profile_500_b2g.txt and creating profile_500_b2g.sym ... Stabilizing 611 Settings ... Pulling /data/local/tmp/profile_2_611.txt into profile_611_Settings.txt Adding symbols to profile_611_Settings.txt and creating profile_611_Settings.sym ... Merging profile: profile_500_b2g.sym profile_611_Settings.sym Results: profile_captured.sym Removing old profile files (from device) ... done
Important: If you do not see the line
Results: profile_captured.sym, YOUR PROFILE WAS NOT SUCCESSFULLY CAPTURED. Try it again. This should be very uncommon now.
If you use a build from pvtbuilds and you don't have the symbols locally, you can capture using the
./profile.sh capture -s http://symbolapi.mozilla.org
Stop the Profiler
You can stop the profiler now: Run
./profile.sh stop, and your phone should reboot.
./profile.sh stop Profiler appears to be running. Killing b2g ........ b2g doesn't seem to want to go away. Try rebooting.
Upload and Share the Profile to Cleopatra
You should now have a file called
profile_captured.sym. Head on over to Cleopatra to view the results — under Upload your profile here, click the Browse... button and select the
After a few seconds, you should see something like this:
The next best thing you can do is to share it. Push the Share button on the bottom left of the interface:
After it finishes uploading, the URL should now be something like
https://people.mozilla.org/~bgirard/cleopatra/#report=03e8dc46769c50751c23cbb9d707e980f96f56b5. You can now send that link to whoever you want to share the results with!
Profiling local Windows builds
If you built Firefox for Windows locally and you would like to use the local symbols with the profiler, you will need to run an additional tool; see Profiling with the Built-in Profiler and Local Symbols on Windows.
Profiling Firefox mobile
- For local builds of Fennec, you should build with optimization and
STRIP_FLAGS="--strip-debug"but NOT with
--enable-profiling. Nightly builds are already built with the appropriate flags.
- You'll need to have
arm-eabi-addr2line(which is part of the Android NDK) in your bash
PATH, so use
locate arm-eabi-addr2line(on Linux) or
mdfind name:arm-eabi-addr2line(on OS X) and stick an export to its location in
~/.bash_profile. The extension will invoke bash to use
- Install the latest pre-release build in your host machine's Firefox browser that has your phone reachable via ADB. This will add a icon in the top right of the browser.
- Select target Mobile USB and press Connect. The first run will take an additional 1 minute or so to pull in the required system libraries.
Profiling JS benchmark (xpcshell)
- You'll need a custom build of the xpcshell, including the following patches: 100µs sampling patch (bug 807854), and — on Linux — the experimental patch to enable for native stacks (bug 812946).
- To profile the script
run.jswith IonMonkey (
-I), type inference (
-n) and JäegerMonkey (
-m). Thgis requires the following command:
The xpcshell output all benchmark information and on its last line it output the result of the profiling, you can filter it with
$ xpcshell -m -I -n -e ' const Ci = Components.interfaces; const Cc = Components.classes; var profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); profiler.StartProfiler( 10000000 /* = profiler memory */, 1 /* = sample rate: 100µs with patch, 1ms without */, ["stackwalk", "js"], 2 /* = features, and number of features. */ ); ' -f ./run.js -e ' var profileObj = profiler.getProfileData(); print(JSON.stringify(profileObj)); ' | tail -n 1 > run.cleo
tail -n 1and redirect it to a file to prevent printing it in your shell. The expected size of the output is around 100 of MB.
- To add symbols to your build, you need to call
./scripts/profile-symbolicate.pyavailable in B2G repository. If libraries are not found, you will need to patch the script with bug 812063's attachment.
$ GECKO_OBJDIR=<objdir> PRODUCT_OUT=<objdir> TARGET_TOOLS_PREFIX= \ ./scripts/profile-symbolicate.py -o run.symb.cleo run.cleo
- Clone Cleopatra and start the server with
- Access Cleopatra from your web browser by loading the page
localhost:8000, and upload
run.symb.cleoto render the profile with most of the symbol information.
Native stack vs. Pseudo stack
The profiler periodically samples the stack(s) of thread(s) in Firefox, collecting a stack trace, and presents the aggregated results using the Cleopatra UI. Stack traces can be collected into two different ways: Pseudostack (the default) or Nativestack.
With Pseudostack, we sidestep the difficulties and performance overheads of unwinding stacks in a robust and platform independent way by using function entry/exit tags added by hand to important points in the code base. The stacks you see in the UI are chains of these tags. This gives robust stacks that work on all platforms, but they miss out on un-annotated areas of the code base, and give no visibility into system libraries or drivers.
Tagging is done by adding macros of the form
PROFILER_LABEL("NAMESPACE", "NAME"). These add RAII helpers, which are used by the profiler to track entries/exits of the annotated functions. For this to be effective, you need to liberally use
PROFILER_LABEL throughout the code. See
GeckoProfiler.h for more variations like
Because of the small overhead of the instrumentation, the sample label shouldn't be placed inside hot loops. A profile reporting that a large portion is spent in "Unknown" code indicates that the area being executed doesn't have any sample labels. As we focus on using this tool and add additional sample labels coverage should improve.
Nativestack is an optional, platform specific feature that isn't complete yet. The goal is to provide "native" — that is, real — stacktraces on platforms that support it. Having this feature will give us detailed stacks and help us analyze problems where we're spending time in drivers and system libraries. We're working on building the proper stack walking and symbolization required to make this step work, and are looking for help with this feature.
The profiler will operate in either Pseudostack or Nativestack mode depending on your environment. See above for details on these.
|Custom Build||Nightly||Release (Gecko 15.0+)|
|Windows||Native stack (Custom steps)||Native stack||Pseudo stack|
|Mac||Native stack||Native stack||Pseudo stack|
|Linux||Pseudo stack (Bug for Native stack)||Pseudo stack (Bug for Native stack)||Pseudo stack|
|Fennec||Pseudo stack (Bug for Native stack)||Pseudo stack (Bug for Native stack)||Pseudo stack (19+)|
|B2G||Native stack (EHABI unwinds)||Pseudo stack||None (Bug)|
Using native stack unwinding on 32- and 64-bit Linux
Nightly builds for 32- and 64-bit Linux now have native stack unwinding via Breakpad available. This is controlled by a set of environment variables if you profile using a clean reboot of the phone (e.g.
./profile.sh start). Otherwise, these variables are passed in via the
profile.sh script. Here are some recommended settings. I suggest you use all of them, and adjust as appropriate.
MOZ_PROFILER_VERBOSE=1: This makes the logging output a bit more verbose, which helps to diagnose possible problems reading or using the Dwarf CFI (unwind information) that is used.
MOZ_PROFILER_INTERVAL=50: This sets the sampling interval to the specified number of milliseconds. You can reduce this down to 1 millisecond, but I'd recommend you do some trial runs at 50 milliseconds and gradually reduce the interval. Native unwinding can be expensive, so you can end up with Firefox or Fennec being unresponsive if you set the interval too low.
MOZ_PROFILER_MODE=native: This controls how stack unwinding is done, and can take three values:
native, it uses Breakpad only to unwind the stacks. With
pseudo, the stacks are pseudostacks only, as described above. With
combined, both a native and a pseudo stack trace is obtained for each sample point, and are interleaved based on observed stack pointer values, to created a combined trace. You can also set this to
helpto get a summary of all of these options.
MOZ_PROFILER_STACK_SCAN=0(zero): Breakpad has multiple different schemes for unwinding the stack, of varying levels of trustworthyness: using Dwarf CFI data, using frame pointers, and scanning the stack looking for probable return addresses. This last scheme is used when nothing else works. It can generate useful data, but can also add frames that are not really present, which is very confusing. By default, stack scanning is disallowed. You can selectively re-enable it by changing the value to 1, 2, 3, etc. What this does is to limit the number of frames obtained by stack scanning to the specified number, and truncates the trace if any more stack-scanned frames are found. This is best left at the default setting (zero). If however you absolutely need the profiler to unwind through some library in which it is getting stuck, try increasing it gradually, but be aware you may get bogus stack traces as a result.
If you have problems getting a native stacktrace instead of a pseudostack still, try enabling both "Stackwalk" and "Breakpad" options in the profiler options.
Profile Fails to Upload
You can upload profiles up to about 10 MB in size to the public central storage (AppEngine). For profiles bigger you will have to download the profile and then either
- Share the file or
- Host the file yourself while allowing
Access-Control-Allow-Origin *. For apache (people.mozilla.org) use
$ echo "Header set Access-Control-Allow-Origin *" > .htaccessand share the URL
http://people.mozilla.com/~bgirard/cleopatra/?customProfile=<URL>, replacing <URL> with the location of your profile file.
Profiling a hung process
It is possible to get profiles from hung Firefox processes using lldb1.
- After the process has hung, attach lldb.
- Type in2, :
- Clone mstange’s handy profile analysis repository.
python symbolicate_profile.py somepath/profile.txt
To graft symbols into the profile. mstange’s scripts do some fairly clever things to get those symbols – if your Firefox was built by Mozilla, then it will retrieve the symbols from the Mozilla symbol server. If you built Firefox yourself, it will attempt to use some cleverness3 to grab the symbols from your binary.
Your profile will now, hopefully, be updated with symbols.
Then, load up Cleopatra, and upload the profile.
I haven’t yet had the opportunity to try this, but I hope to next week. I’d be eager to hear people’s experience giving this a go – it might be a great tool in determining what’s going on in Firefox when it’s hung!
SPS has rudimentary support for profiling multiple threads. To enable it, check the 'Multi-Thread' box then enter one or more thread names into the textbox beside it. Thread names are the strings passed to the base::Thread class at initialization. At present there is no central list of these thread names, but you can find them by grepping the source.
If the filter you entered is invalid, no threads will be profiled. You can identify this by hitting Analyze (Cleopatra will show you an error message). If the filter is left empty, only the main thread is captured (as if you had not enabled Multi-Thread.)
The profiler supports several features. These are options to gather additional data in your profiles. Each option will increase the performance overhead of profiling so it's important to activate only options that will provide useful information for your particular problem to reduce the distortion.
This feature is deprecated. The goal was to only record samples while the browser was not responsive.
When taking a sample the profiler will attempt to unwind the stack using platform specific code appropriate for the ABI. This will provide an accurate callstack for most samples. On ABIs where framepointers are not avaiable this will cause a significant performance impact.
This feature is currently deprecated.
Main Thread IO
This will interpose file I/O and report them in the profiles.
This will sample other threads. This fields accept a comma seperated list of thread names. A thread can only be profiled if it is registered to the profiler.
Use the Intel Power Gadget driver to tag each sample with the power state of the CPU.
This will insert a timer query during compositing and show the result in the Frames view. This will appropriate how much GPU time was spent compositing each frame.
Layers & Texture
The profiler can be used to view the layer tree at each composite, optionally with texture data. This can be used to debug correctness problems.
Viewing the Layer Tree
To view the layer tree, the
layers.dump pref must be set to
true in the Firefox or B2G program being profiled.
Note: in B2G, layer dumping can also be enabled from the Developer menu in Settings.
In addition, both the compositor thread and the content thread (in the case of B2G, the content thread of whichever app you're interested in) must be profiled. For example, on B2G, when profiling the Homescreen app, you might start the profiler with:
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Homescreen
Having gotten a profile this way, the layer tree for a composite can be seen by clicking on a composite in the "Frames" section of Cleopatra (you may need to a sub-range of samples to make individual composites large enough to be clicked). This will activate the "LayerTree" tab:
In this screenshot, Composite #143 has been selected. The layer tree structure can be seen in the left panel. It contains, for each layer, the type of the layer, and various metrics about the layer, such as the visible region and any transforms. In the right panel, a visualization of the layer tree (based entirely on the aforementioned metrics) is shown. Hovering over a layer in the left panel highlights the layer in the right panel. This is useful for identifying what content each layer corresponds to. Here, I'm hovering over the last layer in the layer tree (a PaintedLayerComposite), and a strip at the top of the right panel is highlighted, telling me that this layer is for the system notification bar in B2G.
Sometimes, it's useful to see not only the structure of the layer tree for each composite, but also the rendered textures for each layer. This can be achieved by additionally setting the
layers.dump-texture pref to
true, or by adding
-f layersdump to the profiler command line (the latter implies both the
Warning: Dumping texture data slows performance considerably, and requires a lot of storage for the profile files. Expect rendering to happen at a significantly reduced frame rate when profiling this way, and keep the duration of the capture short, to ensure the samples of interest aren't overwritten.
Here's how the Layer Tree view looks in Cleopatra with texture data:
This time, the visualization in right panel shows the actual textures rather than just the outlines of the layers. This can be very useful for debugging correctness problems such as a temporary visual/rendering glitch, because it allows you to find the precise composite that shows the glitch, and look at the layer tree for that composite.
Visualizing a layer tree without a profile
If you have a layer dump from somewhere (such as from
adb logcat on B2G), you can get Cleopatra to visualize it (just the structure of course, not textures) without needing a profile. To do so, paste the layer dump into the "Enter your profile data here" text field on the front page of Cleopatra:
The resulting "profile" will have the Layer Tree view enabled (but nothing else). This is useful in cases where you want to gain a quick visual understanding of a layer dump without having to take a profile.
On B2G, each line of a layer dump in
adb logcat output is prefixed with something like
I/Gecko (30593):. Cleopatra doesn't currently understand this prefix, so it needs to be removed before pasting.
Dump the display list after each refresh with the texture data. This can be used to debug correctness problems.