Xperf is part of the Microsoft Windows Performance Toolkit, and has functionality similar to that of Shark, oprofile, and (for some things) dtrace/Instruments. For stack walking, Windows Vista or higher is required; I haven't tested it at all on XP.
This page applies to xperf version 4.8.7701 or newer. To see your xperf version, either run '
xperf' on a command line with no arguments, or start '
xperfview' and look at Help -> About Performance Analyzer. (Note that it's not the first version number in the About window; that's the Windows version.)
If you have an older version, you will experience bugs, especially around symbol loading for local builds.
For all versions, the tools are part of the latest Windows 7 SDK (SDK Version 7.1). Use the web installer to install at least the "Win32 Development Tools". Once the SDK installed, execute either
wpt_x64.msi in the
Redist/Windows Performance Toolkit folder of the SDK's install location (typically Program Files/Microsoft SDKs/Windows/v7.1/Redist/Windows Performance Toolkit) to actually install the Windows Performance Toolkit tools.
REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f
Symbol Server Setup
With the latest versions of the Windows Performance Toolkit, you can modify the symbol path directly from within the program via the Trace menu. Just make sure you set the symbol paths before enabling "Load Symbols" and before opening a summary view. You can also modify the
_NT_SYMCACHE_PATH environment variables to make these changes permanent.
The standard symbol path that includes both Mozilla's and Microsoft's symbol server configuration is as follows:
To add symbols from your own builds, add
_NT_SYMBOL_PATH. As with all Windows paths, the symbol path uses semicolons (
;) as separators.
Make sure you select the Trace -> Load Symbols menu option in the Windows Performance Analyzer (xperfview).
All these tools will live, by default, in C:\Program Files\Microsoft Windows Performance Toolkit. Either run these commands from there, or add the directory to your path. You will need to use an elevated command prompt to start or stop profiling.
Start recording data:
xperf -on latency -stackwalk profile
"Latency" is a special provider name that turns on a few predefined kernel providers; run "xperf -providers k" to view a full list of providers and groups. You can combine providers, e.g. "xperf -on DiagEasy+FILE_IO". "-stackwalk profile" tells xperf to capture a stack for each PROFILE event; you could also do "-stackwalk profile+file_io" to capture a stack on each cpu profile tick and each file io completion event.
xperf -d out.etl
The MSDN "Quickstart" page goes over this in more detail, as well as has good explanations of how to use xperfview. I'm not going to repeat it here, because I'd be using essentially the same screenshots, so go look there.
xperf has good tools for heap allocation profiling, but they have one major limitation: you can't build with jemalloc and get heap events generated. The stock windows CRT allocator is horrible about fragmentation, and causes memory usage to rise drastically even if only a small fraction of that memory is in use. However, even despite this, it's a useful way to track allocations/deallocations.
Capturing Heap Data
The "-heap" option is used to set up heap tracing. Firefox generates lots of events, so you may want to play with the BufferSize/MinBuffers/MaxBuffers options as well to ensure that you don't get dropped events. Also, when recording the stack, I've found that a heap trace is often missing module information (I believe this is a bug in xperf). It's possible to get around that by doing a simultaneous capture of non-heap data.
To start a trace session, launching a new Firefox instance:
xperf -on base
xperf -start heapsession -heap -PidNewProcess "./firefox.exe -P test -no-remote" -stackwalk HeapAlloc+HeapRealloc -BufferSize 512 -MinBuffers 128 -MaxBuffers 512
To stop a session and merge the resulting files:
xperf -stop heapsession -d heap.etl
xperf -d main.etl
xperf -merge main.etl heap.etl result.etl
"result.etl" will contain your merged data; you can delete main.etl and heap.etl. Note that it's possible to capture even more data for the non-heap profile; for example, you might want to be able to correlate heap events with performance data, so you can do "
xperf -on base -stackwalk profile".
In the viewer, when summary data is viewed for heap events (Heap Allocations Outstanding, etc. all lead to the same summary graphs), 3 types of allocations are listed -- AIFI , AIFO, AOFI. This is shorthand for "Allocated Inside, Freed Inside", "Allocated Inside, Freed Outside", "Allocated Outside, Freed Inside". These refer to the time range that was selected for the summary graph; for example, something that's in the AOFI category was allocated before the start of the selected time range, but the free event happened inside.
- In the summary views, the yellow bar can be dragged left and right to change the grouping -- for example, drag it to the left of the Module column to have grouping happen only by process (stuff that's to the left), so that you get symbols in order of weight, regardless of what module they're in.
- Dragging the columns around will change grouping in various ways; experiment to get the data that you're looking for. Also experiment with turning columns on and off; removing a column will allow data to be aggregated without considering that column's contributions.
To get good data from a Firefox build it is important to build with the following options in your mozconfig.
This disables frame-pointer optimization which lets xperf do a much better job unwinding the stack. Traces can be captured fine without this option (for example, from nightlies), but the stack information will not be useful.
This gives us symbols.
For More Information
Microsoft's documentation for xperf is pretty good; there is a lot of depth to this tool, and you should look there for more details.