Gloda indexing

This page provides a big-picture summary of what the indexer does; please see the source for nitty-gritty details or if this page seems to be wrong.  GlodaIndexer provides the core indexing logic. GlodaMsgIndexer has the message-specific stuff, although the actual attribute-providers are found in GlodaExplicitAttr and GlodaFundAttrGlodaABIndexer has the limited address book support.

What Folders get Indexed?

  • Never Indexed
    • per priority policy:
      • Trash folders
      • Junk folders
      • Queue folders (outbox?)
      • Newsgroup folders
    • per hard-coded do-not-go-in-there logic:
      • non-Mail folders (we do a bitmask test on the folder flags)
      • Virtual folders (we do a bitmask test on the folder flags)
      • folders which are neither local nor IMAP (we do an instanceof test to make sure the implementations are either local folders or IMAP folders)
  • Indexed with priority, where bigger numbers are indexed before lower numbers:
    • 60:
      • Sent mail
    • 50:
      • Inboxes
    • 40
      • Favorite folders -- You can right click on folders and mark them as your favorite.  Note that currently gloda does not listen for this attribute changing on a folder so this is likely to be inconsistently applied.
    • 30
      • CheckNew -- This might come from a secret pref or something.  Just like favorite folders, this is likely to be stale.
    • 20
      • Default; everybody else.

What Messages get Indexed?

All local (POP) messages get indexed.  IMAP messages get indexed according to the following rules provided in table form:

  The folder IS marked for offline storage.
The folder is NOT marked for offline storage.
The message IS offline.
We index the message including its body.  This means attribute providers get to see all of the message headers and the contents of the message (converted to plaintext) are available for fulltext queries. This can't happen.
The message is NOT offline.
We wait for the message to get brought offline, likely by autosync.  The message is invisible to gloda until this time. We index the message based on the nsMsgDBHdr but do not stream the message.  Attribute providers only get to see what is already on the nsMsgDBHdr; they do not get to see all headers.  The message body is not available for fulltext queries.

What Parts of Messages get Full-text Indexed?

  • Always
    • The message author's display name and e-mail address *as present in the e-mail message*.  We do not involve any nickname you've applied to the contact.
    • The message recipients' display name and e-mail addresses *as present in the e-mail message*.
    • The message subject line.
  • Only if the message body was available:
    • The body text.  If the message had an HTML part without a corresponding text part, we convert the HTML part to text.
    • The attachment names.

The impact of the latter point is that a gloda full-text search will still work even when messages are not available offline, but obviously the quality of the search will be greatly reduced.

Document Tags and Contributors

Contributors to this page: chrisdavidmills, mhammond, AndrewSutherland
Last updated by: chrisdavidmills,