mozilla
Your Search Results

    Continuous Integration

    When you push a commit to mozilla-central or a related repository, it initiates a large chain of builds and tests across multiple types of infrastructure.  This document will help you understand all the pieces that comprise Mozilla's continuous integration systems.

    Buildbot and Treeherder

    The first system to pick up changes pushed to hg is buildbot, Mozilla's primary continuous integration tool.  Buildbot generates binary builds for Firefox, Firefox for Android, and Firefox OS across a variety of operating sytems.  After the builds are completed, they are used to run a series of correctness and performance tests.

    The results of buildbot jobs (both builds and tests) are displayed in Treeherder.  There is a group of individuals who are constantly monitoring Treeherder, looking for broken builds and/or tests.  These individuals are known as sheriffs.  The sheriffs' role is to "keep the tree green", or in other words, to keep the code in our respositories in a good state, to the extent that the state is reflected in the output shown on Treeherder.  When sheriffs see a build or test has been broken, they are empowered to take one of several actions, including backing out a patch which caused the problem and closing the tree (i.e., preventing any additional commits).

    Results in Treeherder are ordered by mercurial pushes.  Each buildbot job is represented by a colored label; green means a job has succeeded, while other colors represent different kinds of problems.  The label text indicates the job type.  For a full list of job types, see the Help menu in Treeherder's upper-right corner.  Below is a list of the most common.

    Builds

    • B - Normal build jobs; these jobs perform compilation and some compiled-code tests (e.g., 'make check').
    • Be - B2G build jobs for engineering builds; user builds are denoted with B, the same as for desktop and Android builds.
    • N and Ne - Nightly build jobs; these jobs are similar to B and Be jobs, but are triggered on a periodic basis instead of being triggered by a push to hg.
    • Hf - Static rooting hazard analysis
    • S - Static analysis
    • V - Valgrind build and test jobs; these jobs create valgrind-compatible builds and run a small set of valgrind tests on them.

    Functional Tests

    These jobs are scheduled after a build job has successfully produced a build and uploaded it to ftp.mozilla.org.  These test jobs can sometimes run even if a build job fails, if the build job failed during 'make check'.

    See the full list of tests at the Mozilla Automated Testing page.

    Talos Performance Tests

    All performance tests run in buildbot and displayed in Treeherder are run using the Talos framework, and denoted by the letter T.  These jobs are scheduled at the same time as the correctness jobs.  Talos is used to execute several suites for desktop Firefox and Firefox for Android; these suites are denoted using lower-case letters, e.g., T(c d g1 o s tp).

    For a list of tests, see the Mozilla Automated Testing page.

    The Talos indicators in Treeherder appear green if the job successfully completed; to see the performance data generated by the jobs, you need to click on the graphserver or datazilla links that appear in Treeherder's lower pane when clicking the Talos job.

    Each Talos suite contains a set of tests or pages, some of these in turn have sub-tests.  Each test is executed multiple times to produce a number of data replicates.   The Talos harness produces a single number per test (typically the median of all the replicates excluding the first 1-5) and reports these to graphserver.  Thus graphserver can be used to monitor course-grained performance metrics over time.  Talos also reports all its raw data to Datazilla, which can be used to examine performance metrics in higher detail.

    Note:  Treeherder will eventually collect and display the same raw Talos data as Datazilla, enabling us to retire both graphserver and Datazilla.  This capability is currently under development.


    TaskCluster

    TaskCluster is a new continuous-integration system that may eventually replace some or all of buildbot, for some or all of the build and test automation currently being run there.  It's currently in an experimental stage.   For status, ask in the #taskcluster IRC channel.

    Other Performance Systems

    Most of the performance tests run at Mozilla happen outside of buildbot.   Below is a list of these.

    Autophone (Android)

    Autophone is a test harness which runs a set of performance tests on a variety of real Android phones.  It reports to a custom dashboard known as phonedash.  Tests currently run are primarily startup tests.

    Eideticker (Android, B2G)

    Eideticker is a test harness which attempts to quantify performance on real Android and B2G devices using high-speed video capture.  Videos captures are created while the phones are driven by the Eideticker harness, and then the captures are analyzed in a number of ways in order to answer questions like "how long did my app take to launch?" and "how much checkerboarding was there when a page was scrolled?".  Eideticker reports to its own custom dashboard (Android version, B2G version), but will likely report to Treeherder eventually as well.

    B2GPerf (B2G)

    B2GPerf is a simple harness which collects cold application startup metrics and reports them to Datazilla.  It will likely be deprecated in the near future by 'make test-perf'.

    B2GPerf is run a set of real B2G devices using Jenkins; the devices and branches tested vary over time depending on the needs of the Firefox OS Performance team.

    make test-perf (B2G)

    'make test-perf' is a Marionette-based Gaia performance test harness which collects a number of performance-related metrics for Gaia, including application memory consumption, various startup metrics, and other developer-defined metrics.  It reports to Datazilla.

    make test-perf is run on the same Jenkins instance as B2GPerf.

    Games Benchmarking (Firefox)

    Under development, the games benchmarking harness (aka mozbench) will allow a number of games-related benchmarks to be run against Firefox and Chrome.  Test results will be reported to Datazilla.  Eventually, the system will likely be expanded with support for Android and B2G.


    Other Functional Test Systems

    In contrast to performance tests, most functional tests are within the buildbot system.  However, a few things run outside of it, and are listed below.

    Gaia-ui-tests (B2G)

    Gaia-ui-tests are on-device UI tests of Gaia, running in Jenkins, across a range of device types and branches.  These tests are Python Marionette-based.  Test results are only available in Jenkins currently, and are monitored by QA.  Eventually, these tests will report status to Treeherder as well.  Gaia-ui-tests are also run in buildbot against B2G desktop builds, and the status of those is visible in Treeherder.

    JSMarionette tests (B2G)

    The JSMarionette tests (aka gaia-integration tests) are based on the JavaScript Marionette client.  They're run on the same Jenkins instance as the gaia-ui-tests.  Like gaia-ui-tests, they are also run in buildbot against B2G desktop builds.  Gaia developers are the primary maintainers of this set of tests.


    Post-Job Analysis and Alerts

    There is some analysis of test data that occurs out-of-band after jobs complete. 

    Graphserver Alerts / dev.tree-management

    Talos posts aggregate metrics (or "scores") to graphserver, which in turn generates an alert after taking into account 12 consecutive data points (by push order) per branch, and checking if a sustained change with a magnitude greater than 2x stddev is detected.

    Outputs

    1. Posts are automatically sent to the newsgroup mozilla.dev.tree-management for each alert (improvement or regression) detected by GraphServer.
    2. A Talos Alert Dashboard consolidates these alerts and assists in tracking them.

    Datazilla Alerts

    Performance data submitted to Datazilla (both Talos and B2G) is subjected to statistical analysis which can generate alerts on performance regressions.  For Talos, these alerts are generated using the higher-resolution data posted to Datazilla, rather than the lower-resolution data posted to Graphserver.  Datazilla Alerts for B2G and Talos are currently sent to different mailing lists:  fxos-perf-alerts for B2G regressions, and a custom list for Talos regressions.

    After performance data is migrated from Datazilla to Treeherder, this alerting system will continue to operate but will use Treeherder as its data source.

    OrangeFactor

    After functional tests complete, a separate system collects test log data and combines it with Treeherder's failure classification data.  The result is plotted on the OrangeFactor dashboard.  The "Orange Factor" is the average number of intermittent test failures that occur per push, and the dashboard can be used to view the most frequent intermittent test failures, as well as to inspect historical trends.

     

    Document Tags and Contributors

    Contributors to this page: jgriffin, edmorley
    Last updated by: edmorley,