Filter a Mercurial Changelog feed by Pushlog directory paths

As of early 2009, the Mozilla HG (Mercurial) changelog and pushlog feeds are too broad to watch for changes in a component or small project. The changelog entries are titled with a descriptive check-in comment, but do not list changed files. The pushlog feeds list the changed files, but do not have a descriptive title. Fortunately the two feeds both include a changeset id, so they can be matched in Yahoo Query Language (YQL) to produce a custom Yahoo Pipes feed that is restricted to a component or project directory. Unfortunately, the changelog feed is short (fixed at 20 entries, which is often less than two days of commits), so the filtered feed still must be retrieved daily to keep up with changes.

Two feeds with insufficient info separately

Consider the Calendar Project (Lightning/Sunbird), whose code resides in the calendar/ directory. The atom Changelog feed (http://hg.mozilla.org/comm-central/atom-log) for the comm-central repository shows all commits from the Thunderbird, Seamonkey, and Calendar projects. The Thunderbird commits greatly outnumber the Calendar commits, so for people who want to watch for just Calendar commits it contains too many non-Calendar changes, and but not enough information to filter it.

<feed xmlns="http://www.w3.org/2005/Atom">
  ...
  <entry>
    <title>Bug 444444 fix MM to do JJ</title>
    <id>http://www.selenic.com/mercurial/#changeset-CCCCCCC</id>
    <link href="http://hg.mozilla.org/comm-central/rev/CCCCCCC"/>
    <updated>yyyy-MM-ddThh:mm:ssZ</updated>
    <author>
      <name>Cody Wright<name>
      <email>cody@example.com<email>
    </author>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <pre xml:space="preserve">Bug 444444 fix MM to do JJ</pre>
      </div>
    </content>
  </entry>
  ...
</feed>  

The Mozilla HG (Mercurial) Changelog feed (atom-log) takes no parameters and returns a list of checked-in changesets for a repository, such as the entry above. The query cannot be restricted to a directory, so all changes in the repository are included in the feed at the server. As you can see above, the result content includes the changeset id, the bug number, author, and descriptive check-in comment, but does not include the files affected. So even though many feed readers can filter entries by content, at the recipient it is impossible to filter this feed to restrict it to changes in the "calendar/" directory, based on the content of the feed alone.

The calendar project commits also appear in the Pushlog feed (http://hg.mozilla.org/comm-central/pushlog) for the comm-central repository, along with commits from the thunderbird and seamonkey projects. The pushlog feed lists the changed files in each entry, so it is possible to filter the entries by the directory. However, the title of each entry is just the changeset id, and the content is just the list of files changed, so it is not useful for people who want to watch for a summary of what changed and why in the calendar code.

<feed xmlns="http://www.w3.org/2005/Atom">
  ...
  <entry>
    <title>Changeset CCCCCCC</title>
    <id>http://www.selenic.com/mercurial/#changeset-CCCCCCC</id>
    <link href="http://hg.mozilla.org/comm-central/rev/CCCCCCC"/>
    <updated>yyyy-MM-ddThh:mm:ssZ</updated>
    <author>
      <name>cody@example.com<name>
    </author>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <ul class="filelist">
          <li class="file">calendar/base/content/file1.js</li>
          <li class="file">calendar/base/content/file2.xul</li>
        </ul>
      </div>
    </content>
  </entry>
  ...
</feed>  

The Mozilla HG Pushlog feed (pushlog) takes startdate and enddate parameters and returns a list of checked-in changesets for a repository, such as the entry above. The query cannot be restricted to a directory, so all changes in the respository are included in the feed at the server. As you can see above, its content includes the the filepaths, so it is possible to filter the entries based on the directory. The content also includes changeset id and the author, but not the descriptive comment nor bug id. Therefore, while it is possible to filter this feed, reading it does not provide a summary of what changed inside the files, and why.

Combining the feeds with Yahoo Query Language (Yahoo Pipes)

Yahoo Query Language (YQL) is an SQL-like language that can treat atom feeds as input tables. Below is code that selects entries from the comm-central atom-log. Each atom-log entry must have an id which matches an id of an entry in the comm-central pushlog over the last two days, and that pushlog entry must contain the directory "calendar/".

select * from atom 
where url='http://hg.mozilla.org/comm-central/atom-log' 
and id in 
(select id from atom
 where url='http://hg.mozilla.org/comm-central/pushlog?startdate=2+days+ago&enddate=now'
 and content like '% calendar/%')

This query can easily be adapted for

  • another directory in same respository
    [change calendar/, but keep wildcard % at each end, and keep the leading space as in '% calendar/%' so it will only match when calendar/ is at the start of the path],
  • multiple directories in same respository
    [replace content like '% calendar/%'
    with (content like '% calendar/base/public/%' or content like '% calendar/providers/%') including parentheses],
  • another respository at same server [change comm-central], or
  • another server [change hg.mozilla.org, but check whether it actually implements an atom-log and pushlog compatibly, as they have been customized by Mozilla].

You can try the query at YQL console (free Yahoo registration required).

 

Making this query directly to query.yahooapis.com/v1/public/yql can produce an XML file, but not an atom or RSS feed. To create a feed the results must be reformatted. The simplest way to do that is to create a Yahoo "Pipe".

Creating a Yahoo Pipe

Creating a Yahoo pipe for a YQL query is very simple.

  1. Log in to pipes.yahoo.com (free Yahoo registration required).
  2. Click "Create a Pipe".
    This takes you to a graphical programming language page. In the left column will be a tree of components that can be dragged onto the page.
  3. From "Sources", drag the "YQL" component onto the page (or just click its "+" icon).
    The "YQL" component will appear on the page with a text field for the query. (Since it was the first component, the "Pipe output" component now also appears.)
  4. Copy entire text of the select query into the YQL component text area.
    select * from atom 
    where url='http://hg.mozilla.org/comm-central/atom-log' 
    and id in 
    (select id from atom
     where url='http://hg.mozilla.org/comm-central/pushlog?startdate=2+days+ago&enddate=now'
     and content like '% calendar/%')
    
  5. At the bottom of the "YQL" component is a circle, its output. At the top of the "Pipe output" component is another circle, its input. Connect them by dragging (hold down the mouse button) from the "YQL" output circle to the "Pipe Output" input circle; a flexible pipe will appear while you drag the mouse, and will connect the two circles after you release the mouse button.
    You can test the result by clicking on the output component, then clicking refresh in the debugger pane at the bottom of the screen. If it is working, a list/tree of entries will appear; clicking on the title will expand the entry to show the other fields. (If the YQL is not working, the YQL console may be a better YQL debugging tool since it shows error messages.)
  6. Click the "Save" button near the top of the window. You will need to give your pipe a name. This sample query can be called "MozCalendar: Recent changes"
  7. Click "Run Pipe" near the top of the window.
    The page for this pipe will appear, and a pane with the current results will be showing. Above the results there is a buttons and links, including "Get as RSS", a link which produces the results as an RSS feed.

On the day this was written, the comm-central Changelog feed showed 20 entries. The MozCalendar feed just created showed two entries with their descriptive titles confirming they are calendar project changes. It's a useful filter, but since the changelog feed is fixed at 20 entries, and 20 entries often covers less than two days of commits, the feed must still be updated daily to catch all changes.

If you use Thunderbird to read the RSS newsfeed, the Bug ID Helper extension will linkify references to "bug NNNNN" in message bodies, and provide a customizable tooltip that can lookup its bugzilla status and title.

Limits

Here's the RSS link for the original pipe used in this example. You may try it out, but please create your own pipe if you plan to use it frequently. There are YQL request rate limits per application pipe as well as per IP address. And please don't abuse hg.mozilla.org with very frequent queries.

YQL is in beta, so terms and features are subject to change in the future. YQL is sufficient to filter one feed using info from another feed as above, but so far there seems to be no way to create a feed where each entry combines information from two feeds (no sql joins, no select expression in column list, only single "from" table). So it appears not (yet) possible to create a feed which has both the informative title and the list of changed files in one message. This restriction may be one way to limit the computational cost of the free service, in addition to the maximum time limits (YQL Guide (pdf), Ch4: 4 seconds fetch, 30 seconds total).

 

Document Tags and Contributors

Tags:
Contributors to this page: gekacheka
Last updated by: gekacheka,