Performance fundamentals

This article is in need of a technical review.

This article is in need of an editorial review.

Performance means efficiency. This document gives a wide overview of open web application performance, what it is, how the browser platform helps to improve it, and what tools and processes you can use to test and improve it .

What is performance?

Performance is entirely what is perceived by users. Users provide inputs to the system through touch, movement, and speech. In return, they perceive outputs in the form of visual, tactile, and auditory feedback. Performance is the quality of outputs in response to these inputs.

Code optimized for a target other than user-perceived performance (hereafter UPP) will lose in competition against code that is, all other things being equal. Users will prefer an application that is responsive and smooth while  processing, say 1,000 database transactions per second, over another that processes 100,000,000 per  second but is choppy and unresponsive. Of course, that is  not to say it's pointless to optimize metrics like the number of database transactions processed per second. The point is that these metrics are of secondary importance, relative  to real UPP targets.

There are a few key performance metrics : the next few subsections will identify and discuss these.

Responsiveness

The first is responsiveness, which is simply the speed with which the system provides outputs (possibly multiple ones) in response to user inputs. For example, when a user taps the screen, they expect the pixels to change in a certain way. The time elapsed between the "tap" gesture and the change in pixels is the responsiveness metric for that interaction.

Responsiveness sometimes involves multiple stages of feedback. Application launch is one particularly important case, which is discussed in more detail below.

Responsiveness is important for the simple reason that no one wants to be ignored. Every millisecond that elapses between the user providing input to the system and the system finally responding with output is a millisecond that the user is being ignored. Being ignored engenders frustration and anger.

Framerate

The next key metric is framerate, the rate at which the system changes pixels displayed to the user. This is a familiar concept: everyone prefers, say, games that display 60 frames per second over ones that display 10 frames per second, even if they can't explain why.

Framerate is important as a "quality of service" metric. Computer displays are designed to "fool" user's eyes, by delivering photons to them that mimic reality. For example, document readers display text by creating arrangements of display pixels that are designed to create photons that strike users' retinas in the same patterns as photons reflected off crisply-struck text on physical paper.

In reality, motion is "continuous" (as inferred by our brains anyway); it's not jerky and discrete, but rather "updates" smoothly. (Strobe lights are fun because they turn that upside down, starving our brains of inputs to create the illusion of discrete reality.) On a computer display, a higher framerate simply allows the display to imitate reality more faithfully.

Note: Humans are generally not able to perceive differences in framerate above 60Hz. That's why most modern electronic displays are designed to refresh at that rate. A television probably looks choppy and unrealistic to a hummingbird, for example.

Memory usage

Memory usage is another key metric. Unlike responsiveness and framerate, users don't directly perceive memory usage. However, memory usage is a close approximation to "user state". An ideal system would maintain 100% of user state at all times: all applications in the system would run simultaneously, and all applications would retain the state created by the user the last time the user interacted with the application. (Application state is stored in computer memory, which is why the approximation is close.)

An important corollary of this is contrary to popular belief: a well-designed system should not be optimized to maximize the amount of free memory. Memory is a resource, and free memory is a unused resource. Rather, a well-designed system should be optimized to use as much memory as possible in service of maintaining user state, while meeting other UPP goals.

Optimizing a system to use memory effectively doesn't mean it should waste memory. Using more memory than is required to maintain some particular user state is wasting a resource that could be used to retain some other user state in the system. In reality, no system can maintain all user state. Intelligently allocating memory to user state is an important concern that's discussed in more detail below.

Power usage

The final metric discussed here is power usage. Like memory usage, users don't directly perceive power usage. Users perceive power usage indirectly by their devices being able to maintain all other UPP goals for a longer duration. In service of meeting UPP goals, the system must use only the minimum amount of power required.

The remainder of this document will discuss performance in terms of these metrics.

Platform performance optimizations

This section provides a brief overview of how Firefox/Gecko contributes to performance generally, below the level of all applications. From a developer's or user's perspective, this answers the question "what does the platform do for you?"

Web technologies

The web platform provides many tools, some better suited for particular jobs than others. All application logic is written in JavaScript. For displaying graphics, developers can choose between the high-level declarative languages of HTML and CSS, or use low-level imperative interfaces offered by the <canvas> element (which includes WebGL). Somewhere "in between" HTML/CSS and canvas is SVG, which offers some benefits of both.

HTML and CSS greatly increase productivity, sometimes at the cost of pixel-level control over rendering or a few frames per second. Text and images are reflowed automatically, the system theme is applied to UI elements by default, and "built-in" support is provided for some use cases developers may not think about initially, like different-resolution displays or right-to-left languages.

The canvas element offers a pixel buffer directly to developers to draw on. This gives pixel-level control over rendering and precise control of framerate to developers. But it comes at the expense of extra work needed to deal with multiple resolutions and orientations, right-to-left languages, and so forth. Developers draw to canvases using either a familiar 2D drawing API, or WebGL, a "close to the metal" binding that mostly follows OpenGL ES 2.0.

Note: Firefox OS is optimized for applications built with web technologies: HTML, CSS, JavaScript, etc. Except for a handful of basic system services, all the code that runs in Firefox OS is web applications and the Gecko engine. Even the operating system window manager is written in HTML, CSS and JavaScript. Because the core operating system is built with the same web technologies that applications are built with, the performance of those technologies is critical. There's no "escape hatch". This greatly benefits developers because all the optimizations that enable a performant OS are available to third-party applications as well. There's no "magic performance sauce" available only to preinstalled code. See Firefox OS performance testing for more details relevant to Firefox OS.

Gecko rendering

The Gecko JavaScript engine supports just-in-time (JIT) compilation. This enables application logic to perform comparably to other virtual machines — such as Java virtual machines — and in some cases even close to "native code".

The graphics pipeline in Gecko that underpins HTML, CSS and Canvas is optimized in several ways. The HTML/CSS layout and graphics code in Gecko minimizes invalidation and repainting for common cases likes scrolling; developers get this support "for free". Pixel buffers painted by both Gecko "automatically" and applications to canvas "manually" minimize copies when being drawn to the display framebuffer. This is done by avoiding intermediate surfaces where they would create overhead (such as per-application "back buffers" in many other operating systems), and by using special memory for graphics buffers that can be directly accessed by the compositor hardware. Complex scenes are rendered using the device's GPU for maximum performance. To improve power usage, simple scenes are rendered using special dedicated composition hardware, while the GPU idles or turns off.

Fully static content is the exception rather than the rule for rich applications. Rich applications use dynamic content with animation and transition effects. Transitions and animations are particularly important to applications: developers can use CSS to declare complicated behaviour with a simple, high-level syntax. In turn, Gecko's graphics pipeline is highly optimized to render common animations efficiently. Common-case animations are "offloaded" to the system compositor, which can render them in a performant and power efficient fashion.

The runtime performance of applications is important, but just as important is their startup performance. Gecko is optimized to load a wide variety of content efficiently: the entire Web! Many years of improvements targeting this content, like parallel HTML parsing, intelligent scheduling of reflows and image decoding, clever layout algorithms, etc., translate just as well to improving web applications on Firefox.

Note: See Firefox OS performance testing for more information about Firefox OS specifics that help to further improve startup performance.

Application performance

This section is intended for developers asking the question: "how can I make my app fast"?

Startup performance

Application startup is punctuated by three user-perceived events, generally speaking.

  • The first is the application first paint — the point at which sufficient application resources have been loaded to paint an initial frame.
  • Second is when the application becomes interactive — for example, users are able to tap a button and the application responds.
  • The final event is full load — for example when all the user's albums have been listed in a music player.

The key to fast startup is to keep two things in mind: UPP is all that matters, and there's a "critical path" to each user-perceived event above. The critical path is exactly and only the code that must run to produce the event.

For example, to paint an application's first frame that comprises visually some HTML and CSS to style that HTML:

  1. The HTML must be parsed
  2. The DOM for that HTML must be constructed
  3. Resources like images in that part of the DOM have to be loaded and decoded
  4. The CSS styles must be applied to that DOM
  5. The styled document has to be reflowed.

Nowhere in that list is "load the JS file needed for an uncommon menu"; "fetch and decode the image for the High Scores list", etc. Those work items are not on the critical path to painting the first frame.

It seems obvious, but to reach a user-perceived startup event more quickly, the main "trick" is to just not run code that's off the critical path. Alternatively, shorten the critical path by simplifying the scene.

The web platform is highly dynamic. JavaScript is a dynamically-typed language, and the web platform allows loading code, HTML, CSS, images, and other resources dynamically. These features can be used to defer work at startup that's off the critical path, by loading the unnecessary content "lazily" some time after startup.

Another problem that can delay startup is idle time, caused by waiting on responses to requests like database loads. To avoid this problem, applications can "front load" the work by issuing requests as early as possible in startup. Then when the data is needed later, it's hopefully already been fetched and the application doesn't need to wait.

Note: For  much more information on improving startup performance, read Optimizing startup performance.

Relatedly, it's important to separate network requests for dynamic data from static content that can be cached locally. Locally-cached resources can be loaded much more quickly than they can be fetched over high-latency and lower-bandwidth mobile networks. Network requests should never be on the critical path to early application startup. Caching resources locally is also the only way applications can be used when "offline", and for standard open web apps, at the moment this requires use of HTML5 AppCache.

Note: Firefox OS allows applications to cache resources by being installed as applications, either being "packaged" in a compressed ZIP file or "hosted" through HTML5 AppCache. How to choose between these options for a particular type of application is beyond the scope of this document, but in general application packages provide optimal load performance; AppCache is slower. Installable apps will hopefully be coming to other platforms soon!

Framerate

The first important consideration for achieving high framerate is to select the right tool for the job. Content that is mostly static, scrolled and infrequently animated is usually best implemented with HTML and CSS. Highly dynamic content like games that need tight control over rendering, and don't need theming, is often best implemented with Canvas.

For content drawn using Canvas, it's up to the developer to hit framerate targets: they have direct control over what's drawn.

For HTML and CSS content, the path to high framerate is to use the right primitives. Firefox is highly optimized to scroll arbitrary content; this is usually not a concern. But often trading some generality and quality for speed, such as using a static rendering instead of a CSS radial gradient, can push scrolling framerate over a target. CSS media queries allow these compromises to be restricted only to devices that need them.

Many applications use transitions or animations through "pages", or "panels". For example, the user taps a "Settings" button to transition into an application configuration screen, or a settings menu "pops up". Firefox is highly optimized to transition and animate scenes that

  • Use pages/panels that are approximately the size of the device screen or smaller
  • Transition/animate the CSS transform and opacity properties

Transitions and animations that adhere to these guidelines can be offloaded to the system compositor and run maximally efficiently.

Memory and power usage

Improving memory and power usage is a similar problem to speeding up startup: don't do unnecessary work, use efficient data structures, lazily load uncommonly-used UI resources, and ensure resources like images are optimized well.

Modern CPUs can enter a lower-power mode when mostly idle. Applications that constantly fire timers or keep unnecessary animations running prevent CPUs from entering low-power mode. Power-efficient applications shouldn't do that.

When applications are sent to the background, a visibilitychange event is fired on their documents. This event is a developer's friend; applications should listen for it. Applications that drop as many loaded resources as possible when sent to the background will use less memory (and be less likely to be discarded, in the case of Firefox OS, see the note). This in turn means they will "start up" faster (by virtue of already being running) and have better UPP.

Note: As mentioned above, Firefox OS tries to keep as many applications running simultaneously as it can, but does have to discard applications sometimes, usually when the device runs out of memory.To find out more about how Firefox OS manages memory usage and kills apps when out of memory issues are encountered, read Debugging out of memory errors on Firefox OS.

Specific coding tips for application performance

The following practical tips will help improve one or more of the Application performance factors discussed above.

Use CSS animations and transitions

Instead of using some library’s animate() function, which probably currently uses many badly performing technologies (window.setTimeout() or top/left positioning, for example) use CSS animations. In many cases, you can actually use CSS Transitions to get the job done. This works well because the browser is designed to optimize these effects and use the GPU to handle them smoothly with minimal impact on processor performance. Another benefit is that you can define these effects in CSS along with the rest of your app's look-and-feel, using a standardized syntax.

CSS animations give you very granular control over your effects using keyframes, and you can even watch events fired during the animation process in order to handle other tasks that need to be performed at set points in the animation process. You can easily trigger these animations with the :hover, :focus, or :target, or by dynamically adding and removing classes on parent elements.

If you want to create animations on the fly or modify them in JavaScript, James Long has written a simple library for that called CSS-animations.js.

Use CSS transforms

Instead of tweaking absolute positioning and fiddling with all that math yourself, use the transform CSS property to adjust the position, scale, and so forth of your content. The reason is, once again, hardware acceleration. The browser can do these tasks on your GPU, letting the CPU handle other things.

In addition, transforms give you capabilities you might not otherwise have. Not only can you translate elements in 2D space, but you can transform in three dimensions, skew and rotate, and so forth. Paul Irish has an in-depth analysis of the benefits of translate() from a performance point of view. In general, however, you have the same benefits you get from using CSS animations: you use the right tool for the job and leave the optimization to the browser. You also use an easily extensible way of positioning elements — something that needs a lot of extra code if you simulate translation with top and left positioning. Another bonus is that this is just like working in a canvas element.

Note: You may need to attach a translateZ(0) transform if you wish to get hardware acceleration on your CSS animations, depending on platform. As noted above, this can improve performance, but can also have memory consumption issues. What you do in this regard is up to do — do some testing and find out what's best for your particular app.

Use requestAnimationFrame() instead of setInterval()

Calls to window.setInterval() run code at a presumed frame rate that may or may not be possible under current circumstances. It tells the browser to render results even while the browser isn't actually drawing; that is, while the video hardware hasn't reached the next display cycle. This wastes processor time (and can even lead to reduced battery life on the user's device).

Instead, you should try to use window.requestAnimationFrame(). This waits until the browser is actually ready to start building the next frame of your animation, and won't bother if the hardware isn't going to actually draw anything. Another benefit to this API is that animations won't run while your app isn't visible on the screen (such as if it's in the background and some other task is operating). This will save battery life and prevent users from cursing your name into the night sky.

Make events immediate

As old-school, accessibility aware web developers we love click events as they also come with the added benefit of supporting keyboard input. On mobile devices, these are too slow. You should use touchstart and touchend instead. The reason is that these don’t have a delay that makes the interaction with the app appear sluggish. If you test for touch support first, you don’t sacrifice accessibility either. For example, the Financial Times uses a library called fastclick for that purpose, which is available for you to use.

Keep your interface simple

One big performance issue we found in HTML5 apps was that moving lots of DOM elements around makes everything sluggish — especially when they feature lots of gradients and drop shadows. Simplyfying your look-and-feel and moving a proxy element around when you drag and drop helps a lot.

When, for example, you have a long list of elements (let’s say tweets), don’t move them all. Instead, keep in your DOM tree only the ones that are visible and a few on either side of the currently visible set of tweets. Hide or remove the rest. Keeping the data in a JavaScript object instead of accessing the DOM can vastly improve your app's performance. Think of the display as a presentation of your data rather than the data itself. That doesn’t mean you can't use straight HTML as the source; just read it once and then scroll 10 elements, changing the content of the first and last accordingly to your position in the results list, instead of moving 100 elements that aren’t visible. The same trick applies in games to sprites: if they aren’t currently on the screen, there is no need to poll them. Instead re-use elements that scroll off screen as new ones coming in.

General application performance analysis

Firefox, Chrome and other browsers both include built-in tools that can help you diagnose slow page rendering.  In particular, Firefox's Network Monitor will display a precise timeline of when each network request on your page happens, how large it is, and how long it takes.

The Firefox network monitor showing get requests, multiple files, and different times taken to load each resource on a graph.

If your page contains JavaScript code that is taking a long time to run, the JavaScript profiler will pinpoint the slowest lines of code:

The Firefox JavaScript profiler showing a completed profile 1.

The Built-in Gecko Profiler is a very useful tool that provides even more detailed information about which parts of the browser code are running slowly while the profiler runs. This is a bit more complex to use, but provides a lot of useful details.

A built-in Gecko profiler windows showing a lot of network information.

Note: You can use these tools with the Android browser by running Firefox and enabling remote debugging.

Using YSlow (which requires FireBug) provides extremely helpful recommendations for improving performance.  Many of the identified problems and suggested solutions identified will be especially useful for mobile browsers.  You should definitely run YSlow and follow its recommendations.

A YSlow window showing a set of tips for improving performance, the top one being make fewer http requests.

In particular, making a large number (dozens or hundreds) of network requests can take longer in mobile browsers.  Rendering of large images and CSS gradients can also take longer.  Simply downloading large files can take longer, even over a fast network, because mobile hardware isn't always fast enough to utilize all the available bandwidth.  For useful general tips on mobile web performance, have a look at Maximiliano Firtman's Mobile Web High Performance talk.

Testcases and submitting bugs

If the Firefox and Chrome developer tools don't help you find a problem, or if they seem to indicate that the problem is caused by the web browser, then try to provide a reduced test case that isolates the problem as much as possible. This can often help in diagnosing problems.

See if you can reproduce the problem by saving and loading a static copy of an HTML page (including any images/stylesheets/scripts it embeds).  If that works, you can then edit the static files to remove any private information, then send them to others for help (submit a Bugzilla report, for example, or host it on a server and share a URL.) You should laso share any profiling information you've collected using the tools listed above.

Attachments

File Size Date Attached by
gecko profiler
A built-in Gecko profiler windows showing a lot of network information.
135382 bytes 2014-01-27 13:44:59 chrisdavidmills
JavaScript Profiler
The Firefox JavaScript profiler showing a completed profile 1.
117078 bytes 2014-01-27 13:45:47 chrisdavidmills
YSlow
A YSlow window showing a set of tips for improving performance, the top one being make fewer http requests.
34826 bytes 2014-01-27 13:49:44 chrisdavidmills
Network monitor final
The Firefox network monitor showing get requests, multiple files, and different times taken to load each resource on a graph.
103956 bytes 2014-01-27 13:52:30 chrisdavidmills

Document Tags and Contributors

Last updated by: Anniej333,