Performance

  • Revision slug: Storage/Performance
  • Revision title: Performance
  • Revision id: 60948
  • Created:
  • Creator: sdwilsh
  • Is current revision? No
  • Comment Removing some very wrong and obsolete information; 725 words removed

Revision Content

 

mozStorage uses {{ mediawiki.interwiki('wikipedia', 'SQLite', 'SQLite') }} as the database backend. It has generally good performance for a small embedded database. However, many things cause various database operations to be slow.

Transactions

There is overhead associated with each transaction. When you execute a SQL statement in isolation, an implicit transaction is created around that statement. When transactions are committed, sqlite does journaling which requires syncing data to disk. This operation is extremely slow. Therefore, if you are doing many transactions in a row, you will get significant performance improvements by surrounding them in a transaction.

If you are not using the advanced caching discussed below, the database cache in memory is cleared at the end of each transaction. This is another reason to use transactions, even if you are only executing read operations.

The Asynchronous writes discussed below removes most of the immediate penalty of a commit, so you will not notice the problem as much. However, there is still overhead, and using a transaction will still be faster. One major problem is that the queue of file operations will get very long if there are many transactions. Some operations require walking this queue to see what operations have been pending, and they will be slower. If the user shuts down the browser soon after you do this type of operation, you can delay shutdown (possibly for many tens of seconds for large numbers of transactions and slow disks), making it look like the browser is hung.

Queries

Careful reordering of the SQL statement, or creating the proper indices, can often improve performance. See the sqlite optimizer overview on the sqlite web site for information on how sqlite uses indices and executes statements.

You might also want to try to the "explain" feature on your statements to see if they are using the indices you expect. Type "explain" followed by your statement to see the plan. For example, explain select * from moz_history; The results are hard to understand, but you should be able to see whether it is using indices. A command that will give you a higher level overview is "explain query plan". For example

sqlite> explain query plan select * from moz_historyvisit v join moz_history h
        on v.page_id = h.id where v.visit_date > 1000000000;

0|0|TABLE moz_historyvisit AS v WITH INDEX moz_historyvisit_dateindex
1|1|TABLE moz_history AS h USING PRIMARY KEY

This tells us that it will first look up in moz_historyvisit using an index, and will then look up in moz_history using the primary key. Both of these are "good" because they use indices and primary keys, which are fast.

sqlite> explain query plan select * from moz_historyvisit where session = 12;

0|0|TABLE moz_historyvisit

In this example, you can see that it is not using an index, so this query would be slow.

You can download the command line tool from the sqlite download page. Be sure you have a version of the command line tool that is at least as recent as what Mozilla uses. As of April 10, 2006, Mozilla uses sqlite 3.3.4, but the latest precompiled version of the command line tools is not available for all platforms. This will lead to errors that say "database is encrypted" because the tool is not able to recognize the file format. You may want to check the SQLITE_VERSION definition in {{ Source("db/sqlite3/src/sqlite3.h") }} for the current version if you are having problems.

Caching

Sqlite has a cache of database pages in memory. It keeps pages associated with the current transaction so it can roll them back, and it also keeps recently used ones so it can run faster.

By default, it only keeps the pages in memory during a transaction (if you don't explicitly open a transaction, one will be opened for you enclosing each individual statement). At the end of a transaction, the cache is flushed. If you subsequently begin a new transaction, the pages you need will be re-read from disk (or hopefully the OS cache). This makes even simple operations block on OS file reads, which can be prohibitive on some systems with bad disk caches or networked drives.

You can control the size of the memory cache using the cache_size pragma. This value controls the number of pages of the file that can be kept in memory at once. The page size can be set using the page_size pragma before any operations have been done on the file. You can see an example of setting the maximum cache size to be a percentage of memory in nsNavHistory::InitDB in {{ Source("toolkit/components/places/src/nsNavHistory.cpp") }}.

Keeping the cache between transactions

Storage enables the sqlite shared-cache mode) which makes multiple connections to the same database share the same cache. Because the cache is not threadsafe, this unfortunately means that you can not have different connections from different threads accessing the same database. However, the shared cache allows us to keep it live between transactions, instead of clearing it after each transaction as sqlite does by default.

If your application uses many small transactons, you can get a significant performance improvement by keeping the cache live between transactions. This is done by using an extra "dummy" connection to the same database (it is important that you use exactly the same filename when opening these connections as determined by strcmp). The dummy connection keeps a perpetually open transaction which locks the cache in memory. Since the cache is shared with the main connection, the cache never expires.

The dummy transaction must be one that locks a page in memory. A simple BEGIN TRANSACTION statement doesn't do this because sqlite does the locking lazily. Therefore, you must have a statement that modifies data. It might be tempting to run a statement on the sqlite_master which contains the information on the tables and indices in the database. However, if your application is initializing the database for the first time, this table will be empty and the cache won't be locked. nsNavHistory::StartDummyStatement creates a dummy table with a single element in it for this purpose.

It is important to note that when a statement is open, the database schema cannot be modified. This means that when the dummy transaction is running, you cannot create or modify any tables or indices, or vacuum the database. You will have to stop the dummy transaction, do the schema-modifying operation, and restart it.

Disk writes

Sqlite provides the basic ACID rules of database theory:

  • Atomicity: each transaction is atomic and cannot be "partially" committed.
  • Consistency: the database won't get corrupted.
  • Isolation: multiple transactions do not affect each other.
  • Durability: once a commit has gone through, the data is guaranteed to be committed.

The problem is that these requirements make some operations, especially commits, very slow. For each commit, sqlite does two disk syncs among many other file operations (see the "Atomic Commit" section of http://www.sqlite.org/php2004/slides-all.html for more information on how this works). These disk syncs are very slow and limit the speed of a commit to the rotational speed of the mechanical disk.

Vacuuming and zero-fill

Sqlite has a VACUUM command to compress unused space from the database. Sqlite works like a memory manager or a file system. When data is deleted, the associated bytes are marked as free but are not removed from the file. This means that the file will not shrink, and some data may still be visible in the file. The way to work around this is to run the VACUUM command to remove this space.

Vacuuming is very slow. The vacuum command is essentially the same as the command line sqlite3 olddb .dump | sqlite3 newdb; mv newdb olddb. On some networked drives, vacuuming a 10MB database has been timed at over one minute. Therefore, you should avoid vacuuming whenever possible.

Some items in databases are privacy sensitive, such as deleted history items. Users have the expectation that deleting items in their history will remove the traces of that from the database. As a result, Mozilla enables the SQLITE_SECURE_DELETE preprocessor flag in {{ Source("db/sqlite3/src/Makefile.in") }}. This flag causes deleted items to be filled with 0s on disk. This eliminates the need to vacuum except to reclaim disk space, and makes many operations much faster.

Zero-filling can have significant performance overhead in some situations. For example, the history service used to delete many database items at shutdown when expiring old history items. This operation is not necessarily slow, but writing 0s to disk in an "ACI" database is still slow. This made shutdown very slow because the AsyncIO thread would block shutdown ({{ Bug("328598") }}). Shutdown times of more than 30 seconds were seen. As a result, this bug introduced incremental history expiration eliminating the need to write many 0s to disk on shutdown.

Unfortunately, this operation cannot be controlled on a per-transaction or per-connection basis. Some operations will benefit, while others will be hurt.

{{ languages( { "pl": "pl/Storage/Wydajno\u015b\u0107" } ) }}

Revision Source

<p> </p>
<p><a href="/en/Storage" title="en/Storage">mozStorage</a> uses {{ mediawiki.interwiki('wikipedia', 'SQLite', 'SQLite') }} as the database backend. It has generally good performance for a small embedded database. However, many things cause various database operations to be slow.</p>
<h3 name="Transactions">Transactions</h3>
<p>There is overhead associated with each transaction. When you execute a SQL statement in isolation, an implicit transaction is created around that statement. When transactions are committed, sqlite does journaling which requires syncing data to disk. This operation is extremely slow. Therefore, if you are doing many transactions in a row, you will get significant performance improvements by surrounding them in a transaction.</p>
<p>If you are not using the advanced caching discussed below, the database cache in memory is cleared at the end of each transaction. This is another reason to use transactions, even if you are only executing read operations.</p>
<p>The Asynchronous writes discussed below removes most of the immediate penalty of a commit, so you will not notice the problem as much. However, there is still overhead, and using a transaction will still be faster. One major problem is that the queue of file operations will get very long if there are many transactions. Some operations require walking this queue to see what operations have been pending, and they will be slower. If the user shuts down the browser soon after you do this type of operation, you can delay shutdown (possibly for many tens of seconds for large numbers of transactions and slow disks), making it look like the browser is hung.</p>
<h3 name="Queries">Queries</h3>
<p>Careful reordering of the SQL statement, or creating the proper indices, can often improve performance. See the <a class="external" href="http://www.sqlite.org/optoverview.html">sqlite optimizer overview</a> on the sqlite web site for information on how sqlite uses indices and executes statements.</p>
<p>You might also want to try to the "explain" feature on your statements to see if they are using the indices you expect. Type "explain" followed by your statement to see the plan. For example, <code>explain select * from moz_history;</code> The results are hard to understand, but you should be able to see whether it is using indices. A command that will give you a higher level overview is "explain query plan". For example</p>
<pre>sqlite&gt; explain query plan select * from moz_historyvisit v join moz_history h
        on v.page_id = h.id where v.visit_date &gt; 1000000000;

0|0|TABLE moz_historyvisit AS v WITH INDEX moz_historyvisit_dateindex
1|1|TABLE moz_history AS h USING PRIMARY KEY</pre>
<p>This tells us that it will first look up in moz_historyvisit using an index, and will then look up in moz_history using the primary key. Both of these are "good" because they use indices and primary keys, which are fast.</p>
<pre>sqlite&gt; explain query plan select * from moz_historyvisit where session = 12;

0|0|TABLE moz_historyvisit</pre>
<p>In this example, you can see that it is <em>not</em> using an index, so this query would be slow.</p>
<p>You can download the command line tool from the <a class="external" href="http://www.sqlite.org/download.html">sqlite download page</a>. Be sure you have a version of the command line tool that is at least as recent as what Mozilla uses. As of April 10, 2006, Mozilla uses sqlite 3.3.4, but the latest precompiled version of the command line tools is not available for all platforms. This will lead to errors that say "database is encrypted" because the tool is not able to recognize the file format. You may want to check the SQLITE_VERSION definition in {{ Source("db/sqlite3/src/sqlite3.h") }} for the current version if you are having problems.</p>
<h3 name="Caching">Caching</h3>
<p>Sqlite has a cache of database pages in memory. It keeps pages associated with the current transaction so it can roll them back, and it also keeps recently used ones so it can run faster.</p>
<p>By default, it only keeps the pages in memory during a transaction (if you don't explicitly open a transaction, one will be opened for you enclosing each individual statement). At the end of a transaction, the cache is flushed. If you subsequently begin a new transaction, the pages you need will be re-read from disk (or hopefully the OS cache). This makes even simple operations block on OS file reads, which can be prohibitive on some systems with bad disk caches or networked drives.</p>
<p>You can control the size of the memory cache using the <code>cache_size</code> <a class="external" href="http://www.sqlite.org/pragma.html">pragma</a>. This value controls the number of pages of the file that can be kept in memory at once. The page size can be set using the <code>page_size</code> pragma before any operations have been done on the file. You can see an example of setting the maximum cache size to be a percentage of memory in nsNavHistory::InitDB in {{ Source("toolkit/components/places/src/nsNavHistory.cpp") }}.</p>
<h4 name="Keeping_the_cache_between_transactions">Keeping the cache between transactions</h4>
<p>Storage enables the <a class="external" href="http://www.sqlite.org/sharedcache.html">sqlite shared-cache mode</a>) which makes multiple connections to the same database share the same cache. Because the cache is not threadsafe, this unfortunately means that you can not have different connections from different threads accessing the same database. However, the shared cache allows us to keep it live between transactions, instead of clearing it after each transaction as sqlite does by default.</p>
<p>If your application uses many small transactons, you can get a significant performance improvement by keeping the cache live between transactions. This is done by using an extra "dummy" connection to the same database (it is important that you use exactly the same filename when opening these connections as determined by <code>strcmp</code>). The dummy connection keeps a perpetually open transaction which locks the cache in memory. Since the cache is shared with the main connection, the cache never expires.</p>
<p>The dummy transaction must be one that locks a page in memory. A simple <code>BEGIN TRANSACTION</code> statement doesn't do this because sqlite does the locking lazily. Therefore, you must have a statement that modifies data. It might be tempting to run a statement on the <code>sqlite_master</code> which contains the information on the tables and indices in the database. However, if your application is initializing the database for the first time, this table will be empty and the cache won't be locked. nsNavHistory::StartDummyStatement creates a dummy table with a single element in it for this purpose.</p>
<p>It is important to note that when a statement is open, the database schema cannot be modified. This means that when the dummy transaction is running, you cannot create or modify any tables or indices, or vacuum the database. You will have to stop the dummy transaction, do the schema-modifying operation, and restart it.</p>
<p>Disk writes</p>
<p>Sqlite provides the basic ACID rules of database theory:</p>
<ul> <li>Atomicity: each transaction is atomic and cannot be "partially" committed.</li> <li>Consistency: the database won't get corrupted.</li> <li>Isolation: multiple transactions do not affect each other.</li> <li>Durability: once a commit has gone through, the data is guaranteed to be committed.</li>
</ul>
<p>The problem is that these requirements make some operations, especially commits, very slow. For each commit, sqlite does two disk syncs among many other file operations (see the "Atomic Commit" section of <a class=" external" href="http://www.sqlite.org/php2004/slides-all.html" rel="freelink">http://www.sqlite.org/php2004/slides-all.html</a> for more information on how this works). These disk syncs are very slow and limit the speed of a commit to the rotational speed of the mechanical disk.</p>
<p>Vacuuming and zero-fill</p>
<p>Sqlite has a VACUUM command to compress unused space from the database. Sqlite works like a memory manager or a file system. When data is deleted, the associated bytes are marked as free but are not removed from the file. This means that the file will not shrink, and some data may still be visible in the file. The way to work around this is to run the VACUUM command to remove this space.</p>
<p>Vacuuming is very slow. The vacuum command is essentially the same as the command line <code>sqlite3 olddb .dump | sqlite3 newdb; mv newdb olddb</code>. On some networked drives, vacuuming a 10MB database has been timed at over one minute. Therefore, you should avoid vacuuming whenever possible.</p>
<p>Some items in databases are privacy sensitive, such as deleted history items. Users have the expectation that deleting items in their history will remove the traces of that from the database. As a result, Mozilla enables the <code>SQLITE_SECURE_DELETE</code> preprocessor flag in {{ Source("db/sqlite3/src/Makefile.in") }}. This flag causes deleted items to be filled with 0s on disk. This eliminates the need to vacuum except to reclaim disk space, and makes many operations much faster.</p>
<p>Zero-filling can have significant performance overhead in some situations. For example, the history service used to delete many database items at shutdown when expiring old history items. This operation is not necessarily slow, but writing 0s to disk in an "ACI" database is still slow. This made shutdown very slow because the AsyncIO thread would block shutdown ({{ Bug("328598") }}). Shutdown times of more than 30 seconds were seen. As a result, this bug introduced incremental history expiration eliminating the need to write many 0s to disk on shutdown.</p>
<p>Unfortunately, this operation cannot be controlled on a per-transaction or per-connection basis. Some operations will benefit, while others will be hurt.</p>
<p>{{ languages( { "pl": "pl/Storage/Wydajno\u015b\u0107" } ) }}</p>
Revert to this revision