Gecko ProfilerはFirefoxに組み込まれたプロファイラです。外部のプロファイラよりも Firefoxとの統合が強化されており、開発者以外のマシンやロックされたAndroid端末など、外部プロファイラが利用できない状況でも利用できます。
Tip: [default] と注釈されたスレッドは、親プロセス（「UI」、別名「ブラウザクロム」、別名「メイン」）プロセスにあり、[tab] で注釈付けされたスレッドはWebコンテンツ "）プロセス。
Tip: 親プロセスの長時間実行されるタスクはブラウザUI（「UIジャンク」とも呼ばれます）ですべての入力または描画をブロックしますが、コンテンツプロセスで長時間実行されるタスクはページとの対話性をブロックしますが、 APZのおかげです。
- 赤色：イベントループが応答していないことを示します。 vsyncなどの優先度の高いイベントはここには含まれていないことに注意してください。また、これは待っているイベントがあった場合に起こったことを示しており、必ずしもそれが保留中のイベントがあるとは限りません。
Tip: While zooming out to a previously-selected range deletes the narrower range, the browser back button can be used to restore the narrower range.
Thread/Process Timelines: Below the tracing markers we have a list of profiled threads. These threads may come from different processes. In this case, we have the 'GeckoMain [default]' process' main thread, a content process' main thread, and the main thread of the compositor process. Each of these timelines is aligned with wall clock time. So, for example, when a thread is blocked, like 'GeckoMain [tab]', on a thread like 'GeckoMain [default]', we can see what's occurring on the latter thread that is preventing it from responding to the former.
X (Time) axis: The timelines go from left to right as wall clock time increases along the X axis. Elements in the timeline are spaced at the sampling frequency with an attempt to align them with time. Factors such as sampling or stack-walking variance and system load can lead to sampling delays which manifest as gaps in the timeline.
Note: because this is a sampling profiler, be cautious when examining running time that is equal to the sampling interval. For very time-sensitive profiling, you may want to consider a non-sampling profiler.
Y (Stack) axis: The Y axis is the stack depth, not the CPU activity. The change in stack height is useful to find patterns like long blocking calls (long flatlines) or very tall spiky blocks (recursive calls and JS). With more experience you can read profiles faster by recognizing patterns. Also note that you can click on timeline elements (the selected element gets darker when selected) and the tree view (see below) reflects the selected element.
Tip: You can right-click on a function name to get an option to copy its name to the clipboard.
A significant portion of time can be spent in idle, blocking calls like waiting for events. This is ideal for a responsive application to be ready to service incoming events. There are OS-specific waiting functions like
NtWaitForMultipleObjects seen in the example above taken on Windows or
mach_msg_trap on macOS.
Tip: You can quickly go deeper into the call tree by holding down your right arrow key. Alternatively, expand an entire tree segment by holding Alt and clicking on the arrow to the left of the collapsed tree segment.
As we progress into a more specific part of the tree, you'll notice that the 'Running time' decreases. This happens when a function has 2 or more non-trivial calls: the running time will be split between its children.
Tip: Focus on one section of the tree by clicking on the "arrow-in-a-circle" icon that appears to the right of the tree element as you hover over it. A "tree breadcrumb" will appear similar to the range breadcrumbs noted above.
Clicking the "Invert call stack" option will sort by the time spent in a function in descending order. Note that the running time here is only the running time of that particular frame and not the total of all called instances of that function. You can see the samples in the Timeline get darker as you select different functions in the Call Tree; these are samples that were taken when the selected function was running.
"Filter stacks" will allow you to search for functions by name. One of the easiest ways to find slowness caused by a page's JS is to type its URL into the "Filter stacks" box. You can then select corresponding Call Tree entries and watch the Timeline for entries in the content process main thread that get darker as you select Call Tree entries.
Tip: If things are blank elsewhere in the UI, you may have text entered into the "Filter stacks" box.
In bug 1334218 an annotation was added to
PresShell::Paint to show the URL of the document being painted. These annotations are not too complex to add so if you would like something added, file a bug.
3. Sharing the profile
Click "Share..." > Share acknowledging that the URLs you had open and your Firefox extensions will be included in the profile data sent to the server. If you select a different time range, the URL revealed by pressing "Permalink" will change so that you can be sure the recipient of the URL will see the same things you are seeing.
Startup::XRE_InitChildProcess, 194 ms of which are spent in
PVsync::Msg_Notify and all child functions that it calls. It is useful to scan down the "Running Time" column and look for when the time changes. While looking for performance problems, you're looking for the processes that are taking the most time; then you can figure out how to optimize them.
Inefficient code that is on the reflow or restyle paths is often to blame for jank. So is code that is run often in the parent process or in parts of the codebase that apply to many users.
Synchronous re-flow can be caused by JS that, for example, makes changes to the page content in a loop and queries about the layout of the page in that same loop.
A PresShell:Flush means that we are either recomputing styles or recomputing layout. These sorts of flushes should be avoided if possible, as they can be quite expensive. Keep your eyes out for flushes like this that are blocking the main thread for a long time. If you notice these things happening in a loop, that's a bug to be fixed, since we're likely "layout thrashing".
Some more tips and answers to common questions are available in a mid-2017 FAQ document.
It's a good idea to search bugzilla before filing a bug about a performance problem in Firefox but sometimes it's hard to find issues that have already been reported. Therefore, it's usually a good idea to file a bug.
If you built Firefox for Windows locally and you would like to use the local symbols with the profiler, you will need to run an additional tool; see Profiling with the Gecko Profiler and Local Symbols on Windows.
The profiler currently doesn't really support symbolication for profiles from Try builds. For Linux builds, there seem to be symbols inside the binaries, which the profiler should pick up correctly. But on Windows and macOS, you'll have to do some tricks to make it work:
- Put your firefox build into a directory with the name
- Download the crashreporter symbols zip for your build. It should be one of the "artifacts" of the build job of your try build.
- Unzip the crashreporter symbols into
- Now profile as usual.
(This abuses the symbolication logic for local builds. It's at ext-geckoProfiler.js and may stop working at any time.)
Firefox 61 for Android supports Gecko profiler again; see Remote profiling on Android for details.
The following information is old version of Firefox for Android.
- For local builds of Fennec, you should build with optimization and
STRIP_FLAGS="--strip-debug"but NOT with
--enable-profiling. Nightly builds are already built with the appropriate flags.
- You'll need to have
arm-eabi-addr2line(which is part of the Android NDK) in your bash
PATH, so use
locate arm-eabi-addr2line(on Linux) or
mdfind name:arm-eabi-addr2line(on OS X) and stick an export to its location in
~/.bash_profile. The extension will invoke bash to use
- Install the latest pre-release build in your host machine's Firefox browser that has your phone reachable via ADB. This will add a icon in the top right of the browser.
- Select target Mobile USB and press Connect. The first run will take an additional 1 minute or so to pull in the required system libraries.
- Start your Firefox with the environment variable
MOZ_PROFILER_STARTUP=1set. This way the profiler is started as early as possible during startup.
- Then capture the profile using the add-on as usual.
Startup profiling does not use the settings that you configured in the add-on's panel. It uses settings that can be configured with the environment variables
- If it looks like the buffer is not large enough, you can tweak the buffer size with the env var
MOZ_PROFILER_STARTUP_ENTRIES. This defaults to 1000000, which is 9MB. If you want 90MB use 10000000, and 20000000 for 180MB, which are good values to debug long startups.
- If you'd like a coarser resolution, you can also choose a different interval using
MOZ_PROFILER_STARTUP_INTERVAL, which defaults to 1 (unit is millisecond). You can't go below 1 ms but you can use e.g. 10 ms.
- To profile the script
run.jswith IonMonkey (
-I), type inference (
-n) and JäegerMonkey (
-m). Thgis requires the following command:
The xpcshell output all benchmark information and on its last line it output the result of the profiling, you can filter it with
$ xpcshell -m -I -n -e ' const Ci = Components.interfaces; const Cc = Components.classes; var profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); profiler.StartProfiler( 10000000 /* = profiler memory */, 1 /* = sample rate: 100µs with patch, 1ms without */, ["stackwalk", "js"], 2 /* = features, and number of features. */ ); ' -f ./run.js -e ' var profileObj = profiler.getProfileData(); print(JSON.stringify(profileObj)); ' | tail -n 1 > run.cleo
tail -n 1and redirect it to a file to prevent printing it in your shell. The expected size of the output is around 100 of MB.
- To add symbols to your build, you need to call
./scripts/profile-symbolicate.pyavailable in B2G repository.
$ GECKO_OBJDIR=<objdir> PRODUCT_OUT=<objdir> TARGET_TOOLS_PREFIX= \ ./scripts/profile-symbolicate.py -o run.symb.cleo run.cleo
- Clone Cleopatra and start the server with
- Access Cleopatra from your web browser by loading the page
localhost:8000, and upload
run.symb.cleoto render the profile with most of the symbol information.
The native stack is the regular C / C++ / rust function stack that you know from your debugger. It's only collected if the "Stack walk" checkbox in the gecko profiler add-on's settings is checked.
The label stack (formerly called "Pseudo stack") uses function entry/exit tags added by hand to important points in the code base. The stacks you see in the UI are chains of these tags. This is good for highlighting particularly interesting parts of the code, but they miss out on un-annotated areas of the code base, and give no visibility into system libraries or drivers.
Tagging is done by adding macros of the form
AUTO_PROFILER_LABEL("NAMESPACE", "NAME"). These add RAII helpers, which are used by the profiler to track entries/exits of the annotated functions. For this to be effective, you need to liberally use
AUTO_PROFILER_LABEL throughout the code. See
GeckoProfiler.h for more variations like
Because of the non-zero overhead of the instrumentation, the sample label shouldn't be placed inside hot loops. A profile reporting that a large portion is spent in "Unknown" code indicates that the area being executed doesn't have any sample labels. As we focus on using this tool and add additional sample labels coverage should improve.
After capturing and viewing a profile you will see "Share..." and "Save as file..." buttons in the top-right of the window. Sharing will upload your profile to perf-html.io and make it public. More information on sharing profiles is available.
It is possible to get profiles from hung Firefox processes using lldb1.
- After the process has hung, attach lldb.
- Type in2, :
- Clone mstange’s handy profile analysis repository.
python symbolicate_profile.py somepath/profile.txt
To graft symbols into the profile. mstange’s scripts do some fairly clever things to get those symbols – if your Firefox was built by Mozilla, then it will retrieve the symbols from the Mozilla symbol server. If you built Firefox yourself, it will attempt to use some cleverness3 to grab the symbols from your binary.
Your profile will now, hopefully, be updated with symbols. Upload it for further analysis!
I haven’t yet had the opportunity to try this, but I hope to next week. I’d be eager to hear people’s experience giving this a go – it might be a great tool in determining what’s going on in Firefox when it’s hung!
The Gecko Profiler has rudimentary support for profiling multiple threads. To enable it, check the 'Multi-Thread' box then enter one or more thread names into the textbox beside it. Thread names are the strings passed to the base::Thread class at initialization. At present there is no central list of these thread names, but you can find them by grepping the source.
If the filter you entered is invalid, no threads will be profiled. You can identify this by hitting Analyze (Cleopatra will show you an error message). If the filter is left empty, only the main thread is captured (as if you had not enabled Multi-Thread.)
The profiler supports several features. These are options to gather additional data in your profiles. Each option will increase the performance overhead of profiling so it's important to activate only options that will provide useful information for your particular problem to reduce the distortion.
When taking a sample the profiler will attempt to unwind the stack using platform specific code appropriate for the ABI. This will provide an accurate callstack for most samples. On ABIs where framepointers are not avaiable this will cause a significant performance impact.
This will interpose file I/O and report them in the profiles.
This will sample other threads. This fields accept a comma seperated list of thread names. A thread can only be profiled if it is registered to the profiler.
This will insert a timer query during compositing and show the result in the Frames view. This will appropriate how much GPU time was spent compositing each frame.
The profiler can be used to view the layer tree at each composite, optionally with texture data. This can be used to debug correctness problems.
Viewing the Layer Tree
To view the layer tree, the
layers.dump pref must be set to
true in the Firefox or B2G program being profiled.
In addition, both the compositor thread and the content thread (in the case of B2G, the content thread of whichever app you're interested in) must be profiled. For example, on B2G, when profiling the Homescreen app, you might start the profiler with:
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Homescreen
Having gotten a profile this way, the layer tree for a composite can be seen by clicking on a composite in the "Frames" section of Cleopatra (you may need to a sub-range of samples to make individual composites large enough to be clicked). This will activate the "LayerTree" tab:
In this screenshot, Composite #143 has been selected. The layer tree structure can be seen in the left panel. It contains, for each layer, the type of the layer, and various metrics about the layer, such as the visible region and any transforms. In the right panel, a visualization of the layer tree (based entirely on the aforementioned metrics) is shown. Hovering over a layer in the left panel highlights the layer in the right panel. This is useful for identifying what content each layer corresponds to. Here, I'm hovering over the last layer in the layer tree (a PaintedLayerComposite), and a strip at the top of the right panel is highlighted, telling me that this layer is for the system notification bar in B2G.
Sometimes, it's useful to see not only the structure of the layer tree for each composite, but also the rendered textures for each layer. This can be achieved by additionally setting the
layers.dump-texture pref to
true, or by adding
-f layersdump to the profiler command line (the latter implies both the
警告: テクスチャデータをダンプすると、パフォーマンスが大幅に低下し、プロファイルファイルに多くの記憶領域が必要になります。 このようにプロファイリングする際には、フレームレートを大幅に下げてレンダリングを実行し、キャプチャ時間を短くして、関心のあるサンプルが上書きされないようにします。
Here's how the Layer Tree view looks in Cleopatra with texture data:
This time, the visualization in right panel shows the actual textures rather than just the outlines of the layers. This can be very useful for debugging correctness problems such as a temporary visual/rendering glitch, because it allows you to find the precise composite that shows the glitch, and look at the layer tree for that composite.
Visualizing a layer tree without a profile
If you have a layer dump from somewhere (such as from
adb logcat on B2G), you can get Cleopatra to visualize it (just the structure of course, not textures) without needing a profile. To do so, paste the layer dump into the "Enter your profile data here" text field on the front page of Cleopatra:
The resulting "profile" will have the Layer Tree view enabled (but nothing else). This is useful in cases where you want to gain a quick visual understanding of a layer dump without having to take a profile.
On B2G, each line of a layer dump in
adb logcat output is prefixed with something like
I/Gecko (30593):. Cleopatra doesn't currently understand this prefix, so it needs to be removed before pasting.
Dump the display list after each refresh with the texture data. This can be used to debug correctness problems.
- Source is located in
- The Bugzilla component is Core::Gecko Profile.
- The profiler add-on repository can be found here: https://github.com/devtools-html/Gecko-Profiler-Addon.
- The Cleopatra repository can be found here: https://github.com/devtools-html/perf.html