Performance is a broad topic. This document gives a brief overview of what Firefox OS was designed and optimized for. It also introduces tools and processes developers can use to improve the performance of their code.
What is performance?
Performance is entirely what is perceived by users. Users provide inputs to the system through touch, movement, and speech. In return, they perceive outputs in the form of visual, tactile, and auditory feedback. Performance is the quality of outputs in response to these inputs.
Code that is optimized for a target other than user-perceived performance (hereafter UPP) will lose in competition against code that is, all other things being equal. Users will prefer an application that's responsive and smooth while only processing, say, 1,000 database transactions a second, over another that processes 100,000,000 a second but is choppy and unresponsive.
Of course, that's not to say it's pointless to optimize metrics like the number of database transactions processed per second; it almost always is. The point is that these metrics are secondary, proximate to real UPP targets.
There are a few key performance metrics. The first is "responsiveness". Responsiveness is simply the speed with which the system provides outputs (possibly multiple ones) in response to user inputs. For example, when a user taps the screen, they expect the pixels to change in a certain way. The time elapsed between the "tap" gesture and the change in pixels is the responsiveness metric for that interaction.
Responsiveness sometimes involves multiple stages of feedback. Application launch is one particularly important case, which is discussed in more detail below.
Responsiveness is important for the simple reason that no one wants to be ignored. Every millisecond that elapses between the user providing input to the system and the system finally responding with output is a millisecond that the user is being ignored. Being ignored engenders frustration and anger.
The next key metric is "framerate": the rate at which the system changes pixels displayed to the user. This is a familiar concept; everyone prefers, say, games that display 60 frames per second over ones that display 10 frames per second, even if they can't explain why.
Framerate is important as a "quality of service" metric. Computer displays are designed to "fool" user's eyes, by delivering photons to them that mimic reality. For example, document readers display text by creating arrangements of display pixels that are designed to create photons that strike users' retinas in the same patterns as photons reflected off crisply-struck text on physical paper.
In reality, motion is "continuous" (as inferred by our brains anyway); it's not jerky and discrete, but rather "updates" smoothly. (Strobe lights are fun because they turn that upside down, starving our brains of inputs to create the illusion of discrete reality.) On a computer display, a higher framerate simply allows the display to imitate reality more faithfully.
(An interesting aside is that humans are generally not able to perceive differences in framerate above 60Hz. That's why most modern electronic displays are designed to refresh at that rate. A television probably looks choppy and unrealistic to a hummingbird, for example.)
Memory usage is another key metric. Unlike responsiveness and framerate, users don't directly perceive memory usage. However, memory usage is a close approximation to "user state". An ideal system would maintain 100% of user state at all times: all applications in the system would run simultaneously, and all applications would retain the state created by the user the last time the user interacted with the application. (Application state is stored in computer memory, which is why the approximation is close.)
An important corollary of this is contrary to popular belief: a well-designed system should not be optimized to maximize the amount of free memory. Memory is a resource, and free memory is a unused resource. Rather, a well-designed system should be optimized to use as much memory as possible in service of maintaining user state, while meeting other UPP goals.
Optimizing a system to use memory doesn't mean it should waste memory. Using more memory than is required to maintain some particular user state is wasting a resource that could be used to retain some other user state.
In reality, no system can maintain all user state. Intelligently allocating memory to user state is an important concern that's discussed in more detail below.
The final metric discussed here is power usage. Like memory usage, users don't directly perceive power usage. Users perceive power usage indirectly by their devices being able to maintain all other UPP goals for a longer duration. In service of meeting UPP goals, the system must use only the minimum amount of power required.
The remainder of this document will discuss performance in terms of these metrics.
This section provides a brief overview of how Firefox OS contributes to performance generally, below the level of all applications. From a developer's or user's perspective, this answers the question, what does the platform do for you?
This section assumes the reader is familiar with the basic conceptual design of Firefox OS.
Because the core operating system is built with the same web technologies that applications are built with, the performance of those technologies is critical. There's no "escape hatch". This greatly benefits developers because all the optimizations that enable a performant OS window manager, for example, are available to third-party applications as well. There's no "magic performance sauce" available only to preinstalled code.
HTML/CSS greatly increase productivity, sometimes at the cost of pixel-level control over rendering or a few frames per second. Text and images are reflowed automatically, the system theme is applied to UI elements by default, and "built-in" support is provided for some use cases developers may not think about initially, like different-resolution displays or right-to-left languages.
The canvas element offers a pixel buffer directly to developers to draw on. This gives pixel-level control over rendering and precise control of framerate to developers. But it comes at the expense of extra work needed to deal with multiple resolutions and orientations, right-to-left languages, and so forth. Developers draw to canvases using either a familiar 2D drawing API, or WebGL, a "close to the metal" binding that mostly follows OpenGL ES 2.0.
(Somewhere "in between" HTML/CSS and canvas is SVG, which is beyond the scope of this document.)
The graphics pipeline in Gecko underlying HTML/CSS and canvas is optimized in several ways. The HTML/CSS layout and graphics code in Gecko minimizes invalidation and repainting for common cases likes scrolling; developers get this support "for free". Pixel buffers painted by both Gecko "automatically" and applications to canvas "manually" minimize copies when being drawn to the display framebuffer. This is done by avoiding intermediate surfaces where they would create overhead (such as per-application "back buffers" in many other operating systems), and by using special memory for graphics buffers that can be directly accessed by the compositor hardware. Complex scenes are rendered using the device's GPU for maximum performance. To improve power usage, simple scenes are rendered using special dedicated composition hardware, while the GPU idles or turns off.
Fully static content is the exception rather than the rule for rich applications. Rich applications use dynamic content with animations, transitions, and other effects. Transitions and animations are particularly important to applications. Developers can use CSS to declare even complicated transitions and animations with a simple, high-level syntax. In turn, Gecko's graphics pipeline is highly optimized to render common animations efficiently. Common-case animations are "offloaded" to the system compositor, which can render them both performantly and power efficiently.
The runtime performance of applications is important, but just as important is their startup performance. Firefox OS improves startup experience in several ways.
Gecko is optimized to load a wide variety of content efficiently: the entire Web! Many years of improvements targeting this content, like parallel HTML parsing, intelligent scheduling of reflows and image decoding, clever layout algorithms, etc, translate just as well to improving web applications on Firefox OS. The content is written using the same technologies.
Each web application has its own instance of the Gecko rendering engine. Starting up this large, complicated engine is not free, and because of that, Firefox OS keeps around a preallocated copy of the engine in memory. When an app starts up, it takes over this preallocated copy and can immediately begin loading its application resources.
Applications "start" most quickly when they're already running. To this end, Firefox OS tries to keep as many applications running in the background as possible, while not regressing the user experience in foreground applications. This is implemented by intelligently prioritizing applications, and discarding background applications according to their priorities when memory is low. For example, it's more disruptive to a user if their currently-playing music player is discarded in the background, while their background calculator application keeps running. So, the music player is prioritized above the calculator automatically by Firefox OS and the calculator is discarded first when memory is low.
Firefox OS prevents applications that are running in the background from impacting the user experience of foreground applications through two mechanisms. First, timers created by background apps are "throttled" to run at a low frequency. Second, background applications are given a low CPU priority, so that foreground applications can get CPU time when they need it.
In addition to the above, Firefox OS includes several features designed to improve power usage that are common to mobile operating systems. The Firefox OS kernel will eagerly suspend the device for minimal power usage when the device is idle. Relatedly, ICs like the GPU, cellular radio, and Wifi radio are powered down when not being actively used. Firefox OS also takes advantage of hardware support for media decoding.
This section is intended for developers asking the question: "how can I make my app fast"?
Application startup is punctuated by three user-perceived events, generally speaking. The first is the application "first paint": the point at which sufficient application resources have been loaded to paint an initial frame. Second is when the application becomes interactive; for example, users are able to tap a button and the application responds. The final event is "full load", for example when all the user's albums have been listed in a music player.
The key to fast startup is to keep two things in mind: UPP is all that matters, and there's a "critical path" to each user-perceived event above. The critical path is exactly and only the code that must run to produce the event.
For example, to paint an application's first frame that comprises visually some HTML and CSS to style that HTML, (i) the HTML must be parsed; (ii) the DOM for that HTML must be constructed; (iii) resources like images in that part of the DOM have to be loaded and decoded; (iv) the CSS styles must be applied to that DOM; (v) the styled document has to be reflowed. Nowhere in that list is "load the JS file needed for an uncommon menu"; "fetch and decode the image for the High Scores list"; etc. Those work items are not on the critical path to painting the first frame.
It seems obvious, but to reach a user-perceived startup event more quickly, the main "trick" is to just not run code that's off the critical path. Alternatively, shorten the critical path by simplifying the scene.
Another problem that can delay startup is idle time, caused by waiting on responses to requests like database loads. To avoid this problem, applications can "front load" the work by issuing requests as early as possible in startup. Then when the data is needed later, it's hopefully already been fetched and the application doesn't need to wait.
Relatedly, it's important to separate network requests for dynamic data from static content that can be cached locally. Locally-cached resources can be loaded much more quickly than they can be fetched over high-latency and lower-bandwidth mobile networks. Network requests should never be on the critical path to early application startup. Caching resources locally is also the only way applications can be used when "offline". Firefox OS allows applications to cache resources by either being "packaged" in a compressed ZIP file or "hosted" through HTML5 appcache. How to choose between these options for a particular type of application is beyond the scope of this document, but in general application packages provide optimal load performance; appcache is slower.
A few other hints are listed below:
Don't include scripts or stylesheets that don't participate in the critical path in your startup HTML file. Load them when needed.
Use the "defer" or "async" attribute on script tags needed at startup. This allows HTML parsers to process documents more efficiently.
Don't force the web engine to construct more DOM than is needed. A "hack" to do this simply is to leave your HTML in the document, but commented out.
<div id="foo"><!-- <div> ... --></div>
When that part of the document needs to be rendered, load the commented HTML.
foo.innerHTML = foo.firstChild.nodeValue;
Use Web Worker Threads for background processing. Only the application "main thread" can process user events and render primary UI. But a common use case is to fetch some data, process it, then update the UI. Use Worker Threads for this work and keep the main thread free for interacting with the user.
The first important consideration for achieving high framerate is to select the right tool for the job. Mostly static content that's scrolled and infrequently animated is usually best implemented with HTML/CSS. Highly dynamic content like games that need tight control over rendering, and don't need theming, is often best implemented with canvas.
For content drawn using canvas, it's up to the developer to hit framerate targets: they have direct control over what's drawn.
For HTML/CSS content, the path to high framerate is to use the right primitives. Firefox OS is highly optimized to scroll arbitrary content; this is usually not a concern. But often trading some generality and quality for speed, such as using a static rendering instead of a CSS radial gradient, can push scrolling framerate over a target. CSS media queries allow these compromises to be restricted only to devices that need them.
Many applications use transitions or animations through "pages", or "panels". For example, the user taps a "Settings" button to transition into an application configuration screen, or a settings menu "pops up". Firefox OS is highly optimized to transition and animate scenes that
- Use pages/panels that are approximately the size of the device screen or smaller
- Transition/animate the CSS transform and opacity properties
Transitions and animations that adhere to these guidelines can be offloaded to the system compositor and run maximally efficiently.
To help diagnose low framerates, see the section below.
Memory and power usage
Improving memory and power usage is a similar problem to speeding up startup: don't do unnecessary work; use efficient data structures; lazily load uncommonly-used UI resources; ensure resources like images are optimized well.
Modern CPUs can enter a lower-power mode when mostly idle. Applications that constantly fire timers or keep unnecessary animations running prevent CPUs from entering low-power mode. Power-efficient applications don't do that.
When applications are sent to the background, a visibilitychange event is fired on their documents. This event is a developer's friend; applications should listen for it. As mentioned above, Firefox OS tries to keep as many applications running simultaneously as it can, but does have to discard applications sometimes. Applications that drop as many loaded resources as possible when sent to the background will use less memory and be less likely to be discarded. This in turn means they will "start up" faster (by virtue of already being running) and have better UPP.
Similarly, applications should prepare for the case when they are discarded. To improve user-perceived memory usage, that is to say, making the user feel that more of their state is being preserved, applications should save the state needed to return the current view if discarded. If the user is editing an entry, for example, the state of the edit should be saved when entering the background.
Measuring performance and diagnosing problems
Before beginning to measure and diagnose performance problems, remember this:
Must. Test. On. Device.
A great strength of the web platform is that the same code written for "desktop" web browsers runs on Firefox OS on mobile devices. Developers should use this to improve productivity: develop on "desktops", in comfortable and productive environments, as much as possible.
But when it comes time to test performance, mobile devices must be used. Modern desktops can be more than 100x more powerful than mobile hardware. The lower-end the mobile hardware tested on, the better.
With that caveat, the sections below describe tools and processes for measuring and diagnosing performance issues.
Firefox OS comes built-in with some convenient and easy-to-use tools that, when used properly, can be used to quickly measure performance. The first tool is the "framerate monitor". This can be enabled in the Firefox OS Settings application.
The framerate monitor continuously reports two numbers. The values reported are an average of recent results within a sliding window, meant to be "instantaneous" but fairly accurate. As such, both numbers are "guesses". The left number is the "composition rate": the estimated number of times per second Firefox OS is drawing frames to the hardware framebuffer. This is an estimate of the user-perceived framerate, and only an estimate. For example, the counter may report 60 compositions per second even if the screen is not changing. In that case the user-perceived framerate would be 0. However, when used with this caveat in mind and corroborated with other measurements, the monitor can be a useful and simple tool.
The rightmost number is the "layer transaction rate", the estimated number of times per second processes are repainting and notifying the compositor. This number is mostly useful for Gecko platform engineers, but it should be less than or equal to the composition rate number on the left.
Firefox OS also has a tool that can help measure startup time, specifically the "first paint" time described above. This "time-to-load" tool can be enabled using the Settings application. The value shown by the tool is the elapsed time between when the most recent application was launched, and an estimate of the first time that application painted its UI. This number only approximates the real "first paint" time, and in particular underestimates it. However, lowering this number almost always correlates to improvements in real startup time, so it can be useful to quickly measure optimization ideas.
For accurately measuring both startup times and responsiveness, a high-speed camera is indispensable. "High-speed" means that the camera can record 120 frames per second or above. The higher the capture rate, the more accurate the measurements that can be made. This may sound like exotic technology, but consumer models can be purchased for a few hundred US dollars.
The measuring process with these cameras is simple: record the action to be studied, and then play back the capture and count the number of frames that elapse between the input (say, a tap gesture) and the desired output (pixels changing in some way). Divide the number of counted frames by the capture rate, and the resulting number is the measured duration.
Mozilla built an automated tool called Eideticker which operates on the same principle as described above. The difference is that Eideticker uses synthetic user input events and HDMI capture to measure durations. The code is available and can be used with any device with an HDMI output.
Measuring power can be more difficult. It's possible to jury-rig measurement apparatus with a soldering iron, but a good approximation of power usage can be gathered by observing CPU load. Simple command-line tools like |top| allow monitoring CPU usage continuously.
In general, when measuring performance, don't be proud! "Primitive technology" like a stopwatch or logging, when used effectively, can provide eminently usable data.
Diagnosing performance problems
If performance measurements show an application is below its targets, how can the underlying problem be diagnosed?
The first step of any performance work is to create a reproducible workload and reproducible measurement steps. Then gather baseline measurements, before any code changes are made. It seems obvious, but this is required to determine whether code changes actually improve performance! The measurement process selected isn't too important; what's important is that the process be (i) reproducible; (ii) realistic, in that it measures what users will perceive as closely as possible; (iii) precise as possible; (iv) accurate as possible. Even stopwatch timings can fit this spec.
Firefox OS includes two built-in tools for quickly diagnosing some performance issues. The first is a render mode called "paint flashing". In this mode, every time a region of the screen is painted by Gecko, Gecko blits a random translucent color over the painted region. Ideally, only parts of the screen that visually change between frames will "flash" with a new color. But sometimes more area than is needed is repainted, causing large areas to "flash". This symptom may indicate that application code is forcing too much of its scene to update. It may also indicate bugs in Gecko itself.
The second tool is called "animation logging", and can also be enabled in Settings. This tool tries to help developers understand why animations are not offloaded to the compositor to be run efficiently as possible. It reports "bugs" like trying to animate elements that are too large, or trying to animate CSS properties that can't be offloaded.
I/Gecko ( 5644): Performance warning: Async animation disabled because frame size (1280, 410) is bigger than the viewport (360, 518) [div with id 'views']
A common pitfall is to animate left/top/right/bottom properties instead of using CSS transforms to achieve the same effect. For a variety of reasons, the semantics of transforms make them easier to offload, but left/top/right/bottom are much more difficult. Animation logging will report this.
Similarly, advanced users may wish to use a whole-system profiler like the linux |perf| tool. This is mostly useful for platform engineers, though.
As with measuring performance, don't be proud about tools used to diagnose it! A few well-placed Date.now() calls with logging can often provide a quick and accurate answer.
Finally, the only way to keep improving performance is to not regress it. The only way to not regress performance is to test it, preferably with automated tests. A full discussion of that topic is beyond the scope of this document, though.
Paris Firefox OS Performance & Optimization Workshop, March 4 - 8, 2013
To illustrate these concepts here are some videos and slides from the Paris Workshop dedicated to performances and optimizations.
Part 1: Technical basics and more (Gabriele & Thomas)
Part 2: Performances in a UX point of view (Josh)
Part 3: Performances measurement & automation (Julien & Anthony)