mozilla
Your Search Results

    Profiling with the built-in profiler

    Firefox now has a built-in profiler. Having a profiler in the code base lets us, among other things, better measure responsiveness, charge performance costs more accurately, and run in environments and platforms in which external profilers aren't available, such as in the user's environment or on a locked Android device. This article details how to use the profiler.

    Getting the Profiler Add-on

    The built-in profiler has two interfaces. For Web developers there is a simplified profiler that can be opened from the menu Tools > Web Developer > Performance. A more advanced interface for developers of Mozilla's internals can be obtained by installing Benoit Girard's Gecko Profiler add-on.

    Using the Add-on

    Reporting a Performance Problem has a step-by-step guide for obtaining a profile when requested by Firefox developers.

    Reporting a Thunderbird Performance Problem has a step-by-step guide for obtaining a profile when requested by Thunderbird developers.

    The profiler uses a fixed size buffer to store a few seconds worth of samples. When it runs out of space it discards old entries. When you stop the profiler it throws away its buffer. When you start the profiler it creates it again and begins to fill it. When a profile is taken, it's sent, for viewing, to a web based application called Cleopatra.

    Click the profiler icon in your toolbar to open the profiler panel. To take a profile you can use the buttons in the profiler panel or the keyboard shortcuts.

    • Ctrl+Shift+1 - Start/Stop the profiler
    • Ctrl+Shift+2 - Take a profile and launch Cleopatra to view it

    Understanding Cleopatra Profiles

    Understanding Cleopatra profiles is a bit more difficult. If you're unfamiliar with C++, you can click the JavaScript only button to see where your JavaScript code is slow. Generally, you will see two rows at the top: 1 for the parent app and one for the child app. Click on the entry you're interested in, e.g. the bottom row if you're interested in your content app. Each entry in the results shows a call stack and how much time is spent in that call stack. For example:

    In the results above we can see that we're spending ~113 milliseconds in cp_handleEvent() when running BrowserElementPanning.js, line 77. That means 113 ms for cp_handleEvent and ALL child functions that are called. We spend 56 ms of that in cp_onTouchStart. We then spend 41 ms in cp_getPannable, 40 ms of which is spent in cp_findPannable. Essentially, you're looking for the processes that are taking the most time, then you can figure out how to optimize them.

    Understanding Frames In Cleopatra

    If you profile the Compositor thread in the main b2g process, you will get a frames view each time a new frame was produced. It will look something like this:

    We want to take a look at the Frames view. We see a couple of things:

    1. Vsync [Orange] - This is really a vsync marker. This is available when the hardware composer on the device is supplying vsync events. Since vsyncs occur at a single point in time, vsyncs are actually a block only for visual readability. The actual vsync time occured at the start of the vsync box. See Project Silk for why this is important.
    2. Refresh [Green] - This is what's occuring during an nsRefreshDriver::Tick phase.
    3. Scripts [Blue] - This is some JavaScript causing a change in layout.
    4. DisplayList [Red] - The amount of time taken to create a display list.
    5. Reflow [Green] - Amount of time spent during reflow.
    6. Composite [Purple] - Amount of time spent performing a composite.
    7. LayerTranscaction - Amount of time spent performing a layer transaction.

    Profiling Boot to Gecko (with a real device)

    There is a script called profile.sh in the root of the B2G tree that simplifies most of the steps of grabbing profile information from the phone. It will profile both Gecko and the running JavaScript. To use this script:

    1. You need to have a local build of B2G. Make sure you build with export MOZ_PROFILING=1 in your .userconfig file (See Customizing your .userconfig). This will not work with prebuilt binaries. Note: if you have a debug build (export B2G_DEBUG=1 in .userconfig), there is no need to additionally export MOZ_PROFILING.
    2. You need to have your phone plugged into your PC and have it accessible via ADB.

    The general steps for using the profiler are:

    1. Start the app you want to profile and perform all the steps required to get you up to the point just before the slow action that you want to investigate.
    2. Start the profiler.
    3. Perform all the actions you want to investigate on the phone.
    4. Capture the profile and stop the profiler.
    5. Upload and Share the profile to Cleopatra.

    Start the Profiler

    TL;DR: You should read the below instructions for more details, but your general command structure for profiling rendering will be ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p YOUR APP NAME HERE, for example ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings. You can find the list of running apps with profile.sh ps. Run ./profile.sh capture after performing the actions you want to profile. You have to profile the B2G app at all times.

    Starting the profiler is done seperatly for each process. The general guideline is to start the profiler on the B2G parent process' compositor thread (effectively the window manager) and the app you want to profile. For example, ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings will start the profiler to profile the B2G compositor and start the profiler on the Settings app. If you start the profiler with this command, you should get an output like this:

    ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings
    Process: b2g
    Using default features js,leaf
    Starting profiling PID 500..
    Profiler started
    
    Process: Settings
    Using default features js,leaf
    Starting profiling PID 611..
    Profiler started

    Note: Using ./profile.sh start, without any arguments, will reset the device and start the profiler on all processes once the phone reboots. This mode is deprecated.

    If you don't know the name of your app or you get an error saying a process doesn't exist, try running ./profile.sh ps. You should get something like this:

    ./profile.sh ps
      PID Name
    ----- ----------------
      845 b2g              profiler not running
      893 (Nuwa)           profiler not running
      909 Homescreen       profiler not running
      937 Messages         profiler not running
     1024 (Preallocated a  profiler not running

    Note: Process names are CASE SENSITIVE.

    Profiler Options

    1. ./profile.sh start Starts the profiler on a specific process / thread. For example, profiling the 'Compositor' thread on the B2G process is useful for profiling scrolling/drawing. The associated flags are:
      • -p: Which process to profile (b2g, Email, etc.).
      • -e: The number of profile entries to capture, which details how much profile data the profiler should keep. This is a circular buffer.
      • -s: The stack scan mode, as detailed above.
      • -i: The sampling interval to the specified number of milliseconds.
      • -m: The profiler mode.
      • -f: The features to enable for the profiler. The default is to profile javascript and leaf mode.
      • -t: Which threads to profile. To specify multiple threads, use a single argument with comma-separated names, e.g. -t Compositor,GeckoMain. (Do not specify a second -t argument, it will just override the first.) Note that the thread name of the Gecko main thread is GeckoMain.
    2. ./profile.sh ps will show running B2G processes and whether the profiler was enabled for those processes or not.
    3. ./profile.sh capture [pid or name] will initiate a capture. If you don't specify any arguments, then all currently running B2G processes will be captured. Otherwise the B2G process with the indicated pid or name will be captured. The profile script will pull the profile files from the phone, and add symbols.
    4. The profile script uses some variables from the file .var.profile, which is generated by the build. These will allow the script to locate your objdir-gecko tree, the appropriate toolchain, and the out/target/product/<phone> tree to get symbols for the Android libraries.
    5. The .txt files will be renamed when pulled to the host and will have the following pattern: profile_HHMM_PID_NAME.txt (or .sym) If your capture includes multiple processes then they'll all have the same HHMM portion. The PID will be the PID of the process, and NAME will be the app name (as per the ps output).
    6. The .sym files (which are the .txt files with symbols added) can then be uploaded to the Cleopatra UI.
    7. ./profile.sh stop will kill the currently running b2g and restart it normally (i.e. profiling disabled).

    Some extra commands available in the profile script (these will not be needed normally, but can be useful if you're working on the script):

    1. ./profile.sh ls will show all of the profile files stored on the phone (it looks in the /data/local/tmp directory).
    2. ./profile.sh ps will show all Gecko processes and if they are being profiled.
    3. ./profile.sh signal [pid] triggers the profiler to store the current profile buffers to files on the phone.
    4. ./profile.sh pull pid [NAME [HHMM]] will pull the profile file for the indicated pid and rename it as mentioned above.
    5. ./profile.sh symbolicate filename will take the profile_HHMM_PID_NAME.txt file and create profile_HHMM_PID_NAME.sym, which has symbols in it.
    6. ./profile.sh help will print out all of the commands currently supported by the script.

    Other Ways to Profile B2G:

    linux's perf tool can be used: See bug 831611 comments 53 and 60.  Do not do BRANCH= when running config.sh via perf; just do ./config.sh.

    Capture the Profile

    Once you have finished all the actions under investigation, you need to capture the profile. You can capture the profile by running ./profile.sh capture. Your output should be something like this:

    ./profile.sh capture
    Signaling Profiled Processes: 500 611
    Stabilizing 500 b2g ...
    Pulling /data/local/tmp/profile_0_500.txt into profile_500_b2g.txt
    Adding symbols to profile_500_b2g.txt and creating profile_500_b2g.sym ...
    Stabilizing 611 Settings ...
    Pulling /data/local/tmp/profile_2_611.txt into profile_611_Settings.txt
    Adding symbols to profile_611_Settings.txt and creating profile_611_Settings.sym ...
    Merging profile: profile_500_b2g.sym profile_611_Settings.sym
    
    Results: profile_captured.sym
    Removing old profile files (from device) ... done

    Important: If you do not see the line Results: profile_captured.sym, YOUR PROFILE WAS NOT SUCCESSFULLY CAPTURED. Try it again. This should be very uncommon now.

    Stop the Profiler

    You can stop the profiler now: Run ./profile.sh stop, and your phone should reboot.

    ./profile.sh stop
    Profiler appears to be running.
    Killing b2g ........
    b2g doesn't seem to want to go away. Try rebooting.

    Upload and Share the Profile to Cleopatra

    You should now have a file called profile_captured.sym. Head on over to Cleopatra to view the results — under Upload your profile here, click the Browse... button and select the profile_captured.sym file.

    After a few seconds, you should see something like this:

    The next best thing you can do is to share it. Push the Share button on the bottom left of the interface:

    After it finishes uploading, the URL should now be something like https://people.mozilla.org/~bgirard/cleopatra/#report=03e8dc46769c50751c23cbb9d707e980f96f56b5. You can now send that link to whoever you want to share the results with!

    Profiling local Windows builds

    If you built Firefox for Windows locally and you would like to use the local symbols with the profiler, you will need to run an additional tool; see Profiling with the Built-in Profiler and Local Symbols on Windows.

    Profiling Firefox mobile

    1. For local builds of Fennec, you should build with optimization and STRIP_FLAGS="--strip-debug" but NOT with --enable-profiling. Nightly builds are already built with the appropriate flags.
    2. You'll need to have adb and arm-eabi-addr2line (which is part of the Android NDK) in your bash PATH, so use locate arm-eabi-addr2line (on Linux) or mdfind name:arm-eabi-addr2line (on OS X) and stick an export to its location in ~/.bash_profile. The extension will invoke bash to use adb and addr2line.
    3. Install the latest pre-release build in your host machine's Firefox browser that has your phone reachable via ADB. This will add a icon in the top right of the browser.
    4. Set devtools.debugger.remote-enabled to true in about:config for Fennec.
    5. Select target Mobile USB and press Connect. The first run will take an additional 1 minute or so to pull in the required system libraries.

    Profiling JS benchmark (xpcshell)

    1. You'll need a custom build of the xpcshell, including the following patches: 100µs sampling patch (bug 807854), and — on Linux — the experimental patch to enable for native stacks (bug 812946).
    2. To profile the script run.js with IonMonkey (-I), type inference (-n) and JäegerMonkey (-m). Thgis requires the following command:
      $ xpcshell -m -I -n -e '
          const Ci = Components.interfaces;
          const Cc = Components.classes;
          var profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler);
          profiler.StartProfiler(
            10000000 /* = profiler memory */,
            1 /* = sample rate: 100µs with patch, 1ms without */,
            ["stackwalk", "js"], 2 /* = features, and number of features. */
          );
        ' -f ./run.js -e '
          var profileObj = profiler.getProfileData();
          print(JSON.stringify(profileObj));
        ' | tail -n 1 > run.cleo
      The xpcshell output all benchmark information and on its last line it output the result of the profiling, you can filter it with tail -n 1 and redirect it to a file to prevent printing it in your shell.  The expected size of the output is around 100 of MB.
    3. To add symbols to your build, you need to call ./scripts/profile-symbolicate.py available in B2G repository. If libraries are not found, you will need to patch the script with bug 812063's attachment.
      $ GECKO_OBJDIR=<objdir> PRODUCT_OUT=<objdir> TARGET_TOOLS_PREFIX= \
          ./scripts/profile-symbolicate.py -o run.symb.cleo run.cleo
    4. Clone Cleopatra and start the server with ./run_webserver.sh.
    5. Access Cleopatra from your web browser by loading the page localhost:8000, and upload run.symb.cleo to render the profile with most of the symbol information.

    Native stack vs. Pseudo stack

    The profiler periodically samples the stack(s) of thread(s) in Firefox, collecting a stack trace, and presents the aggregated results using the Cleopatra UI.  Stack traces can be collected into two different ways: Pseudostack (the default) or Nativestack.

    Pseudostack

    With Pseudostack, we sidestep the difficulties and performance overheads of unwinding stacks in a robust and platform independent way by using function entry/exit tags added by hand to important points in the code base.  The stacks you see in the UI are chains of these tags.  This gives robust stacks that work on all platforms, but they miss out on un-annotated areas of the code base, and give no visibility into system libraries or drivers.

    Tagging is done by adding macros of the form PROFILER_LABEL("NAMESPACE", "NAME"). These add RAII helpers, which are used by the profiler to track entries/exits of the annotated functions.  For this to be effective, you need to liberally use PROFILER_LABEL throughout the code. See GeckoProfiler.h for more variations like PROFILER_LABEL_PRINTF.

    Because of the small overhead of the instrumentation, the sample label shouldn't be placed inside hot loops.  A profile reporting that a large portion is spent in "Unknown" code indicates that the area being executed doesn't have any sample labels.  As we focus on using this tool and add additional sample labels coverage should improve.

    Nativestack

    Nativestack is an optional, platform specific feature that isn't complete yet.  The goal is to provide "native" — that is, real — stacktraces on platforms that support it.  Having this feature will give us detailed stacks and help us analyze problems where we're spending time in drivers and system libraries.  We're working on building the proper stack walking and symbolization required to make this step work, and are looking for help with this feature.

    Note: On Windows XP, the native and pseudostacks do not interleave properly. There is, however, a workaround in the associated bug.

    Availability

    The profiler will operate in either Pseudostack or Nativestack mode depending on your environment. See above for details on these.

      Custom Build Nightly Release (Gecko 15.0+)
    Windows Native stack (Custom steps) Native stack Pseudo stack
    Mac Native stack Native stack Pseudo stack
    Linux Pseudo stack (Bug for Native stack) Pseudo stack (Bug for Native stack) Pseudo stack
    Fennec Pseudo stack (Bug for Native stack) Pseudo stack (Bug for Native stack) Pseudo stack (19+)
    B2G Native stack (EHABI unwinds) Pseudo stack None (Bug)

    Using native stack unwinding on 32- and 64-bit Linux

    Nightly builds for 32- and 64-bit Linux now have native stack unwinding via Breakpad available.  This is controlled by a set of environment variables if you profile using a clean reboot of the phone (e.g. ./profile.sh start). Otherwise, these variables are passed in via the profile.sh script. Here are some recommended settings.  I suggest you use all of them, and adjust as appropriate.

    • MOZ_PROFILER_VERBOSE=1: This makes the logging output a bit more verbose, which helps to diagnose possible problems reading or using the Dwarf CFI (unwind information) that is used.
    • MOZ_PROFILER_INTERVAL=50: This sets the sampling interval to the specified number of milliseconds.  You can reduce this down to 1 millisecond, but I'd recommend you do some trial runs at 50 milliseconds and gradually reduce the interval.  Native unwinding can be expensive, so you can end up with Firefox or Fennec being unresponsive if you set the interval too low.
    • MOZ_PROFILER_MODE=native: This controls how stack unwinding is done, and can take three values: help, native, pseudo and combined.  WIth native, it uses Breakpad only to unwind the stacks.  With pseudo, the stacks are pseudostacks only, as described above.  With combined, both a native and a pseudo stack trace is obtained for each sample point, and are interleaved based on observed stack pointer values, to created a combined trace.  You can also set this to help to get a summary of all of these options.
    • MOZ_PROFILER_STACK_SCAN=0 (zero):  Breakpad has multiple different schemes for unwinding the stack, of varying levels of trustworthyness: using Dwarf CFI data, using frame pointers, and scanning the stack looking for probable return addresses.  This last scheme is used when nothing else works.  It can generate useful data, but can also add frames that are not really present, which is very confusing.  By default, stack scanning is disallowed.  You can selectively re-enable it by changing the value to 1, 2, 3, etc.  What this does is to limit the number of frames obtained by stack scanning to the specified number, and truncates the trace if any more stack-scanned frames are found.  This is best left at the default setting (zero).  If however you absolutely need the profiler to unwind through some library in which it is getting stuck, try increasing it gradually, but be aware you may get bogus stack traces as a result.

    If you have problems getting a native stacktrace instead of a pseudostack still, try enabling both "Stackwalk" and "Breakpad" options in the profiler options.

    Profile Fails to Upload

    You can upload profiles up to about 10 MB in size to the public central storage (AppEngine). For profiles bigger you will have to download the profile and then either

    1. Share the file or
    2. Host the file yourself while allowing Access-Control-Allow-Origin *. For apache (people.mozilla.org) use $ echo "Header set Access-Control-Allow-Origin *" > .htaccess and share the URL http://people.mozilla.com/~bgirard/cleopatra/?customProfile=<URL>, replacing <URL> with the location of your profile file.

    Profiling a hung process

    It is possible to get profiles from hung Firefox processes using lldb1.

    1. After the process has hung, attach lldb.
    2. Type in2, :
      p (void)mozilla_sampler_save_profile_to_file("somepath/profile.txt")
    3. Clone mstange’s handy profile analysis repository.
    4. Run:
      python symbolicate_profile.py somepath/profile.txt

      To graft symbols into the profile. mstange’s scripts do some fairly clever things to get those symbols – if your Firefox was built by Mozilla, then it will retrieve the symbols from the Mozilla symbol server. If you built Firefox yourself, it will attempt to use some cleverness3 to grab the symbols from your binary.

      Your profile will now, hopefully, be updated with symbols.

      Then, load up Cleopatra, and upload the profile.

      I haven’t yet had the opportunity to try this, but I hope to next week. I’d be eager to hear people’s experience giving this a go – it might be a great tool in determining what’s going on in Firefox when it’s hung!

    Profiling Threads

    SPS has rudimentary support for profiling multiple threads. To enable it, check the 'Multi-Thread' box then enter one or more thread names into the textbox beside it. Thread names are the strings passed to the base::Thread class at initialization. At present there is no central list of these thread names, but you can find them by grepping the source.

    Examples: 1 2

    If the filter you entered is invalid, no threads will be profiled. You can identify this by hitting Analyze (Cleopatra will show you an error message). If the filter is left empty, only the main thread is captured (as if you had not enabled Multi-Thread.)

    Profiler Features

    The profiler supports several features. These are options to gather additional data in your profiles. Each option will increase the performance overhead of profiling so it's important to activate only options that will provide useful information for your particular problem to reduce the distortion.

    Jank-only

    This feature is deprecated. The goal was to only record samples while the browser was not responsive.

    Stackwalk

    When taking a sample the profiler will attempt to unwind the stack using platform specific code appropriate for the ABI. This will provide an accurate callstack for most samples. On ABIs where framepointers are not avaiable this will cause a significant performance impact.

    JS Profiling

    Javascript callstacks will be generated and interleaved with the c++ callstacks. This will introduce an overhead when running JS.

    GC Stats

    Will embed GC stats from 'javascript.options.mem.notify' in the profile.

    Breakpad

    This feature is currently deprecated.

    Main Thread IO

    This will interpose file I/O and report them in the profiles.

    Multi-Thread

    This will sample other threads. This fields accept a comma seperated list of thread names. A thread can only be profiled if it is registered to the profiler.

    Power

    Use the Intel Power Gadget driver to tag each sample with the power state of the CPU.

    GPU

    This will insert a timer query during compositing and show the result in the Frames view. This will appropriate how much GPU time was spent compositing each frame.

    Layers & Texture

    The profiler can be used to view the layer tree at each composite, optionally with texture data. This can be used to debug correctness problems.

    Viewing the Layer Tree

    To view the layer tree, the layers.dump pref must be set to true in the Firefox or B2G program being profiled.

    Note: in B2G, layer dumping can also be enabled from the Developer menu in Settings.

    In addition, both the compositor thread and the content thread (in the case of B2G, the content thread of whichever app you're interested in) must be profiled. For example, on B2G, when profiling the Homescreen app, you might start the profiler with:

    ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Homescreen
    

    Having gotten a profile this way, the layer tree for a composite can be seen by clicking on a composite in the "Frames" section of Cleopatra (you may need to a sub-range of samples to make individual composites large enough to be clicked). This will activate the "LayerTree" tab:

    Screenshot of layer tree view in Cleopatra, with no textures.

    In this screenshot, Composite #143 has been selected. The layer tree structure can be seen in the left panel. It contains, for each layer, the type of the layer, and various metrics about the layer, such as the visible region and any transforms. In the right panel, a visualization of the layer tree (based entirely on the aforementioned metrics) is shown. Hovering over a layer in the left panel highlights the layer in the right panel. This is useful for identifying what content each layer corresponds to. Here, I'm hovering over the last layer in the layer tree (a PaintedLayerComposite), and a strip at the top of the right panel is highlighted, telling me that this layer is for the system notification bar in B2G.

    Viewing Textures

    Sometimes, it's useful to see not only the structure of the layer tree for each composite, but also the rendered textures for each layer. This can be achieved by additionally setting the layers.dump-texture pref to true, or by adding -f layersdump to the profiler command line (the latter implies both the layers.dump and layers.dump-texture prefs).

    Warning: Dumping texture data slows performance considerably, and requires a lot of storage for the profile files. Expect rendering to happen at a significantly reduced frame rate when profiling this way, and keep the duration of the capture short, to ensure the samples of interest aren't overwritten.

    Here's how the Layer Tree view looks in Cleopatra with texture data:

    Screenshot of layer tree view in Cleopatra, with textures.

    This time, the visualization in right panel shows the actual textures rather than just the outlines of the layers. This can be very useful for debugging correctness problems such as a temporary visual/rendering glitch, because it allows you to find the precise composite that shows the glitch, and look at the layer tree for that composite.

    Visualizing a layer tree without a profile

    If you have a layer dump from somewhere (such as from adb logcat on B2G), you can get Cleopatra to visualize it (just the structure of course, not textures) without needing a profile. To do so, paste the layer dump into the "Enter your profile data here" text field on the front page of Cleopatra:

    Screenshot of front page of Cleopatra, with pasted layer dump.

    The resulting "profile" will have the Layer Tree view enabled (but nothing else). This is useful in cases where you want to gain a quick visual understanding of a layer dump without having to take a profile.

    On B2G, each line of a layer dump in adb logcat output is prefixed with something like I/Gecko   (30593):. Cleopatra doesn't currently understand this prefix, so it needs to be removed before pasting.

    Display List

    Dump the display list after each refresh with the texture data. This can be used to debug correctness problems.

    Contribute

    Note: While the profiler platform itself is completed, we are working on new feature requests. Watch the "Gecko Profiler" component in Bugzilla for more information.