Shotwell Architecture Overview: Performance via Aggregation and Transactions
As a media manager, it's expected that Shotwell will deal with large collections of photos and videos and their organizing containers, tags and events. It's tempting when writing code to perform an operation on selected group of objects one at a time. This works when a small number of objects is affected, but doesn't scale when the number is huge. Shotwell has systems in place to attempt to deal with operations on groups all at once, or at least in such a way as to minimize impact. This strategy is continually evolving.
1. Signal Aggregation
The first step toward this is in the data architecture in the form of signal aggregation. If several objects are added at once to a DataCollection, instead of firing a separate "added" signal for each one, an "items-added" signal is fired supplying a list of all added items. It's a simple example, but these pluralized signals mean that signal handlers are only called once.
This can be circumvented by poorly written code by adding objects one at a time in a loop rather than gathering them together and submitting them at once. Good interfaces help prevent this but it still must be watched for by human eyes.
2. Freezing and Thawing
Sometimes it's not as simple as gathering all the objects and submitting them at once. Some times multiple objects must be called in succession to perform their operations, or layering prevents an easy way to gather objects together. In this case, DataCollections may be frozen at the start of a large operation set and frozen at the end.
A frozen DataCollection won't fire a subset of its signals when changes occur. For the base DataCollection class, the signals are "items-altered" and "ordering-changed". For example, if LibraryPhoto.global is frozen, any number of Photos may be altered (i.e. rotated) and none of LibraryPhoto.global's observers will be notified. Instead, the signals are aggregated in an internal list.
When the DataCollection is thawed, all the signals that should have been fired are fired at once. Thus, freezing and thawing is a special case of signal aggregation.
Calls to DataCollection's freeze and thaw methods may be nested.
3. Transactions
Freezing a collection allows for signal emission to be controlled. Since most DataSources have a backing representation that must be updated during operations, their respective SourceCollections may offer a TransactionController. Currently only MediaSourceCollections are required to offer one; this may be made a requirement for all SourceCollections in the future.
A TransactionController is an abstract class that mirrors the freeze/thaw characteristics of the DataCollection. Its interface is begin() and commit(), reflecting the database terminology it sprung from. Each SourceCollection is responsible for implementing its own TransactionController. In the case of LibraryPhotos and Videos, their TransactionController.begin() operations (a) freeze their respective SourceCollections and (b) open an SQLite transaction. If tens or hundreds of database operations occur back-to-back, it's far more efficient to group them as a single transaction to the database. TransactionContoller takes advantage of that.
4. In Practice
These strategies evolved over time and were not always available in Shotwell. Different code uses different strategies to optimize performance. Signal aggregation has been around long enough that most code takes advantage of it. Freezing collections is mostly well-used, but there are still some operations (notably Commands) that do not use it yet. Transactions are the newest and code should be moved to it whenever possible.
More or better strategies may appear in the future. These represent the direction Shotwell is going to maximize performance.