Geckoプロファイラでプロファイリングする

この翻訳は不完全です。英語から この記事を翻訳 してください。

Gecko ProfilerはFirefoxに組み込まれたプロファイラです。外部のプロファイラよりも Firefoxとの統合が強化されており、開発者以外のマシンやロックされたAndroid端末など、外部プロファイラが利用できない状況でも利用できます。

Gecko Profilerは以前は「SPS」と「組み込みプロファイラ」として知られていました。私たちは可能な限り多くの古い名前への参照を変更しましたが、まだまだあります。

Gecko Profilerアドオンの入手

まず最初に、Firefoxの正しいビルドを使用しているかどうかを確認する必要があります。公式のNightlyまたはBetaまたはReleaseビルドが動作します。ローカルビルドを使用している場合は、mozconfigで--enable-profilingオプションを有効にしてください。

Gecko Profilerには2つのインターフェースがあります:

  1. Web 開発者にとっては、ツール > Web 開発 > パフォーマンス のメニューから開くことができる簡略化されたプロファイラがあります
  2. Gecko Profiler アドオンをインストールすることで、Mozillaの内部の開発者用のより高度なインターフェースにアクセスすることができます(インストールの詳細が利用できます)。

パフォーマンスの問題を報告するときに、Firefox開発者からリクエストされたときにプロファイルを入手するためのステップバイステップガイドがあります。

Thunderbirdのパフォーマンスの問題を報告するときに、Thunderbird開発者が要求したときにプロファイルを取得するためのステップバイステップガイドがあります。

プロファイルの理解

Gecko Profilerに関するよくある質問を確認することができます。

Ehsanのビデオをいくつか調べてみましょう。

役に立つと思われる機能がある場合は、詳細とともにバグを記録してください。

インターフェイス

1. タイムライン

タイムラインには、興味深いイベントを示すトレースマーカー(色付きセグメント)の行がいくつかあります。より多くの情報を表示するには、それらの上にカーソルを置きます。トレースマーカーの下には、さまざまなスレッドのアクティビティに対応する行があります。

Tip: [default] と注釈されたスレッドは、親プロセス(「UI」、別名「ブラウザクロム」、別名「メイン」)プロセスにあり、[tab] で注釈付けされたスレッドはWebコンテンツ ")プロセス。

Tip: 親プロセスの長時間実行されるタスクはブラウザUI(「UIジャンク」とも呼ばれます)ですべての入力または描画をブロックしますが、コンテンツプロセスで長時間実行されるタスクはページとの対話性をブロックしますが、 APZのおかげです。

トレースマーカー
  • 赤色:イベントループが応答していないことを示します。 vsyncなどの優先度の高いイベントはここには含まれていないことに注意してください。また、これは待っているイベントがあった場合に起こったことを示しており、必ずしもそれが保留中のイベントがあるとは限りません。
  • ブラック:同期IPCコールを示します。
範囲

Timeline showing ranged breadcrumbs and zoom icon

トレースマーカーまたはスレッド領域の任意の場所をクリックしてドラッグすると、時間の範囲を拡大できます。範囲が選択されると、その範囲にズームする拡大鏡が表示されます。トレースマーカーをクリックすると、その期間に対応する選択肢が作成され、興味のある時間の範囲を容易にズームインできます。範囲を拡大すると、以前選択した範囲またはプロファイル全体(「フルレンジ」と表示されます)に簡単に戻ることができるパンくずリストが作成されます。

Tip: While zooming out to a previously-selected range deletes the narrower range, the browser back button can be used to restore the narrower range.

Thread Timeline(s)

Thread/Process Timelines: Below the tracing markers we have a list of profiled threads. These threads may come from different processes. In this case, we have the 'GeckoMain [default]' process' main thread, a content process' main thread, and the main thread of the compositor process. Each of these timelines is aligned with wall clock time. So, for example, when a thread is blocked, like 'GeckoMain [tab]', on a thread like 'GeckoMain [default]', we can see what's occurring on the latter thread that is preventing it from responding to the former.

X (Time) axis: The timelines go from left to right as wall clock time increases along the X axis. Elements in the timeline are spaced at the sampling frequency with an attempt to align them with time. Factors such as sampling or stack-walking variance and system load can lead to sampling delays which manifest as gaps in the timeline.

Note: because this is a sampling profiler, be cautious when examining running time that is equal to the sampling interval. For very time-sensitive profiling, you may want to consider a non-sampling profiler.

Y (Stack) axis: The Y axis is the stack depth, not the CPU activity. The change in stack height is useful to find patterns like long blocking calls (long flatlines) or very tall spiky blocks (recursive calls and JS). With more experience you can read profiles faster by recognizing patterns. Also note that you can click on timeline elements (the selected element gets darker when selected) and the tree view (see below) reflects the selected element.

2. コールツリー

The Call Tree shows the samples organized by 'Running Time' which will show the data by wall clock time. There are lighter grey names to the right of tree elements that indicate where the code comes from. Be aware that elements can be from JavaScript, Gecko, or system libraries. Note that if some functions are not yet named properly, symbolication may not yet be finished.

Tip: You can right-click on a function name to get an option to copy its name to the clipboard.

A significant portion of time can be spent in idle, blocking calls like waiting for events. This is ideal for a responsive application to be ready to service incoming events. There are OS-specific waiting functions like NtWaitForMultipleObjects seen in the example above taken on Windows or mach_msg_trap on macOS.

Tip: You can quickly go deeper into the call tree by holding down your right arrow key. Alternatively, expand an entire tree segment by holding Alt and clicking on the arrow to the left of the collapsed tree segment.

As we progress into a more specific part of the tree, you'll notice that the 'Running time' decreases. This happens when a function has 2 or more non-trivial calls: the running time will be split between its children.

Tip: Focus on one section of the tree by clicking on the "arrow-in-a-circle" icon that appears to the right of the tree element as you hover over it. A "tree breadcrumb" will appear similar to the range breadcrumbs noted above.

Clicking the "JavaScript only" option will only show JavaScript code in the Call Tree. You could compare the time with this option checked and the total time to get an idea of how much time was spent running JS. Note that long-running JS function execution may not actually be taking as long as you think because further down the call stack there may be something like painting happening.

Clicking the "Invert call stack" option will sort by the time spent in a function in descending order. Note that the running time here is only the running time of that particular frame and not the total of all called instances of that function. You can see the samples in the Timeline get darker as you select different functions in the Call Tree; these are samples that were taken when the selected function was running.

"Filter stacks" will allow you to search for functions by name. One of the easiest ways to find slowness caused by a page's JS is to type its URL into the "Filter stacks" box. You can then select corresponding Call Tree entries and watch the Timeline for entries in the content process main thread that get darker as you select Call Tree entries.

Tip: If things are blank elsewhere in the UI, you may have text entered into the "Filter stacks" box.

Custom Annotations

In bug 1334218 an annotation was added to PresShell::Paint to show the URL of the document being painted. These annotations are not too complex to add so if you would like something added, file a bug.

3. Sharing the profile

Click "Share..." > Share acknowledging that the URLs you had open and your Firefox extensions will be included in the profile data sent to the server. If you select a different time range, the URL revealed by pressing "Permalink" will change so that you can be sure the recipient of the URL will see the same things you are seeing.

Tips

Understanding profiles can be difficult. If you're unfamiliar with Gecko's internals, you can click the JavaScript only button to see where your JavaScript code is slow. Each entry in the Call Tree shows a call stack and how much time is spent in that call stack. For example:In the results above we can see that we're spending ~287 milliseconds in Startup::XRE_InitChildProcess, 194 ms of which are spent in PVsync::Msg_Notify and all child functions that it calls. It is useful to scan down the "Running Time" column and look for when the time changes. While looking for performance problems, you're looking for the processes that are taking the most time; then you can figure out how to optimize them.

Common Performance Bugs in Firefox

Inefficient code that is on the reflow or restyle paths is often to blame for jank. So is code that is run often in the parent process or in parts of the codebase that apply to many users.

Synchronous re-flow can be caused by JS that, for example, makes changes to the page content in a loop and queries about the layout of the page in that same loop.

A PresShell:Flush means that we are either recomputing styles or recomputing layout. These sorts of flushes should be avoided if possible, as they can be quite expensive. Keep your eyes out for flushes like this that are blocking the main thread for a long time. If you notice these things happening in a loop, that's a bug to be fixed, since we're likely "layout thrashing".

Some more tips and answers to common questions are available in a mid-2017 FAQ document.

It's a good idea to search bugzilla before filing a bug about a performance problem in Firefox but sometimes it's hard to find issues that have already been reported. Therefore, it's usually a good idea to file a bug.

ローカルWindowsビルドのプロファイリング

If you built Firefox for Windows locally and you would like to use the local symbols with the profiler, you will need to run an additional tool; see Profiling with the Gecko Profiler and Local Symbols on Windows.

ビルドをプロファイリングする

The profiler currently doesn't really support symbolication for profiles from Try builds. For Linux builds, there seem to be symbols inside the binaries, which the profiler should pick up correctly. But on Windows and macOS, you'll have to do some tricks to make it work:

  1. Put your firefox build into a directory with the name dist.
  2. Download the crashreporter symbols zip for your build. It should be one of the "artifacts" of the build job of your try build.
  3. Unzip the crashreporter symbols into dist/crashreporter-symbols/.
  4. Now profile as usual.

(This abuses the symbolication logic for local builds. It's at ext-geckoProfiler.js and may stop working at any time.)

Firefoxモバイルのプロファイリング

Firefox 61 for Android supports Gecko profiler again; see Remote profiling on Android for details.

The following information is old version of Firefox for Android.

  1. For local builds of Fennec, you should build with optimization and STRIP_FLAGS="--strip-debug" but NOT with --enable-profiling. Nightly builds are already built with the appropriate flags.
  2. You'll need to have adb and arm-eabi-addr2line (which is part of the Android NDK) in your bash PATH, so use locate arm-eabi-addr2line (on Linux) or mdfind name:arm-eabi-addr2line (on OS X) and stick an export to its location in ~/.bash_profile. The extension will invoke bash to use adb and addr2line.
  3. Install the latest pre-release build in your host machine's Firefox browser that has your phone reachable via ADB. This will add a icon in the top right of the browser.
  4. Set devtools.debugger.remote-enabled to true in about:config for Fennec.
  5. Select target Mobile USB and press Connect. The first run will take an additional 1 minute or so to pull in the required system libraries.

Firefoxの起動のプロファイリング

  1. Start your Firefox with the environment variable MOZ_PROFILER_STARTUP=1 set. This way the profiler is started as early as possible during startup.
  2. Then capture the profile using the add-on as usual.

Startup profiling does not use the settings that you configured in the add-on's panel. It uses settings that can be configured with the environment variables MOZ_PROFILER_STARTUP_ENTRIES and MOZ_PROFILER_STARTUP_INTERVAL:

  • If it looks like the buffer is not large enough, you can tweak the buffer size with the env var MOZ_PROFILER_STARTUP_ENTRIES. This defaults to 1000000, which is 9MB. If you want 90MB use 10000000, and 20000000 for 180MB, which are good values to debug long startups.
  • If you'd like a coarser resolution, you can also choose a different interval using MOZ_PROFILER_STARTUP_INTERVAL, which defaults to 1 (unit is millisecond). You can't go below 1 ms but you can use e.g. 10 ms.

JSベンチマーク(xpcshell)のプロファイリング

  1. To profile the script run.js with IonMonkey (-I), type inference (-n) and JäegerMonkey (-m). Thgis requires the following command:
    $ xpcshell -m -I -n -e '
        const Ci = Components.interfaces;
        const Cc = Components.classes;
        var profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler);
        profiler.StartProfiler(
          10000000 /* = profiler memory */,
          1 /* = sample rate: 100µs with patch, 1ms without */,
          ["stackwalk", "js"], 2 /* = features, and number of features. */
        );
      ' -f ./run.js -e '
        var profileObj = profiler.getProfileData();
        print(JSON.stringify(profileObj));
      ' | tail -n 1 > run.cleo
    The xpcshell output all benchmark information and on its last line it output the result of the profiling, you can filter it with tail -n 1 and redirect it to a file to prevent printing it in your shell.  The expected size of the output is around 100 of MB.
  2. To add symbols to your build, you need to call ./scripts/profile-symbolicate.py available in B2G repository.
    $ GECKO_OBJDIR=<objdir> PRODUCT_OUT=<objdir> TARGET_TOOLS_PREFIX= \
        ./scripts/profile-symbolicate.py -o run.symb.cleo run.cleo
  3. Clone Cleopatra and start the server with ./run_webserver.sh.
  4. Access Cleopatra from your web browser by loading the page localhost:8000, and upload run.symb.cleo to render the profile with most of the symbol information.

ネイティブスタックとラベルスタック

The profiler periodically samples the stack of the selected threads in Firefox and collects a stack trace. This stack trace is the combined stack of three different stacks: The native stack, the JavaScript stack, and the label stack.

Native stack

The native stack is the regular C / C++ / rust function stack that you know from your debugger. It's only collected if the "Stack walk" checkbox in the gecko profiler add-on's settings is checked.

JavaScript stack

The JavaScript stack is collected by the JS engine. This is controlled by the "JavaScript" checkbox in the gecko profiler add-on's settings panel.

Label stack

The label stack (formerly called "Pseudo stack") uses function entry/exit tags added by hand to important points in the code base.  The stacks you see in the UI are chains of these tags.  This is good for highlighting particularly interesting parts of the code, but they miss out on un-annotated areas of the code base, and give no visibility into system libraries or drivers.

Tagging is done by adding macros of the form AUTO_PROFILER_LABEL("NAMESPACE", "NAME"). These add RAII helpers, which are used by the profiler to track entries/exits of the annotated functions.  For this to be effective, you need to liberally use AUTO_PROFILER_LABEL throughout the code. See GeckoProfiler.h for more variations like AUTO_PROFILER_LABEL_DYNAMIC.

Because of the non-zero overhead of the instrumentation, the sample label shouldn't be placed inside hot loops.  A profile reporting that a large portion is spent in "Unknown" code indicates that the area being executed doesn't have any sample labels.  As we focus on using this tool and add additional sample labels coverage should improve.

Sharing, saving and loading profiles

After capturing and viewing a profile you will see "Share..." and "Save as file..." buttons in the top-right of the window. Sharing will upload your profile to perf-html.io and make it public. More information on sharing profiles is available.

ハングしたプロセスのプロファイリング

It is possible to get profiles from hung Firefox processes using lldb1.

  1. After the process has hung, attach lldb.
  2. Type in2, :
    p (void)profiler_save_profile_to_file("somepath/profile.txt")
  3. Clone mstange’s handy profile analysis repository.
  4. Run:
    python symbolicate_profile.py somepath/profile.txt

    To graft symbols into the profile. mstange’s scripts do some fairly clever things to get those symbols – if your Firefox was built by Mozilla, then it will retrieve the symbols from the Mozilla symbol server. If you built Firefox yourself, it will attempt to use some cleverness3 to grab the symbols from your binary.

    Your profile will now, hopefully, be updated with symbols. Upload it for further analysis!

    I haven’t yet had the opportunity to try this, but I hope to next week. I’d be eager to hear people’s experience giving this a go – it might be a great tool in determining what’s going on in Firefox when it’s hung!

スレッドのプロファイリング

The Gecko Profiler has rudimentary support for profiling multiple threads. To enable it, check the 'Multi-Thread' box then enter one or more thread names into the textbox beside it. Thread names are the strings passed to the base::Thread class at initialization. At present there is no central list of these thread names, but you can find them by grepping the source.

Examples: 1 2

If the filter you entered is invalid, no threads will be profiled. You can identify this by hitting Analyze (Cleopatra will show you an error message). If the filter is left empty, only the main thread is captured (as if you had not enabled Multi-Thread.)

プロファイラの機能

The profiler supports several features. These are options to gather additional data in your profiles. Each option will increase the performance overhead of profiling so it's important to activate only options that will provide useful information for your particular problem to reduce the distortion.

Stackwalk

When taking a sample the profiler will attempt to unwind the stack using platform specific code appropriate for the ABI. This will provide an accurate callstack for most samples. On ABIs where framepointers are not avaiable this will cause a significant performance impact.

JS Profiling

Javascript callstacks will be generated and interleaved with the c++ callstacks. This will introduce an overhead when running JS.

GC Stats

Will embed GC stats from 'javascript.options.mem.notify' in the profile.

Main Thread IO

This will interpose file I/O and report them in the profiles.

Multi-Thread

This will sample other threads. This fields accept a comma seperated list of thread names. A thread can only be profiled if it is registered to the profiler.

GPU

This will insert a timer query during compositing and show the result in the Frames view. This will appropriate how much GPU time was spent compositing each frame.

Layers & Texture

The profiler can be used to view the layer tree at each composite, optionally with texture data. This can be used to debug correctness problems.

Viewing the Layer Tree

To view the layer tree, the layers.dump pref must be set to true in the Firefox or B2G program being profiled.

In addition, both the compositor thread and the content thread (in the case of B2G, the content thread of whichever app you're interested in) must be profiled. For example, on B2G, when profiling the Homescreen app, you might start the profiler with:

./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Homescreen

Having gotten a profile this way, the layer tree for a composite can be seen by clicking on a composite in the "Frames" section of Cleopatra (you may need to a sub-range of samples to make individual composites large enough to be clicked). This will activate the "LayerTree" tab:

Screenshot of layer tree view in Cleopatra, with no textures.

In this screenshot, Composite #143 has been selected. The layer tree structure can be seen in the left panel. It contains, for each layer, the type of the layer, and various metrics about the layer, such as the visible region and any transforms. In the right panel, a visualization of the layer tree (based entirely on the aforementioned metrics) is shown. Hovering over a layer in the left panel highlights the layer in the right panel. This is useful for identifying what content each layer corresponds to. Here, I'm hovering over the last layer in the layer tree (a PaintedLayerComposite), and a strip at the top of the right panel is highlighted, telling me that this layer is for the system notification bar in B2G.

Viewing Textures

Sometimes, it's useful to see not only the structure of the layer tree for each composite, but also the rendered textures for each layer. This can be achieved by additionally setting the layers.dump-texture pref to true, or by adding -f layersdump to the profiler command line (the latter implies both the layers.dump and layers.dump-texture prefs).

警告: テクスチャデータをダンプすると、パフォーマンスが大幅に低下し、プロファイルファイルに多くの記憶領域が必要になります。 このようにプロファイリングする際には、フレームレートを大幅に下げてレンダリングを実行し、キャプチャ時間を短くして、関心のあるサンプルが上書きされないようにします。

Here's how the Layer Tree view looks in Cleopatra with texture data:

Screenshot of layer tree view in Cleopatra, with textures.

This time, the visualization in right panel shows the actual textures rather than just the outlines of the layers. This can be very useful for debugging correctness problems such as a temporary visual/rendering glitch, because it allows you to find the precise composite that shows the glitch, and look at the layer tree for that composite.

Visualizing a layer tree without a profile

If you have a layer dump from somewhere (such as from adb logcat on B2G), you can get Cleopatra to visualize it (just the structure of course, not textures) without needing a profile. To do so, paste the layer dump into the "Enter your profile data here" text field on the front page of Cleopatra:

Screenshot of front page of Cleopatra, with pasted layer dump.

The resulting "profile" will have the Layer Tree view enabled (but nothing else). This is useful in cases where you want to gain a quick visual understanding of a layer dump without having to take a profile.

On B2G, each line of a layer dump in adb logcat output is prefixed with something like I/Gecko   (30593):. Cleopatra doesn't currently understand this prefix, so it needs to be removed before pasting.

Display List

Dump the display list after each refresh with the texture data. This can be used to debug correctness problems.

Contribute

ドキュメントのタグと貢献者

このページの貢献者: silverskyvicto
最終更新者: silverskyvicto,