Shotwell Architecture Overview: Photo Pipeline
1. Non-destructive Editing and the Pipeline
Shotwell provides a non-destructive editor for transforming photos. Instead of applying modifications to the photo files (either the original or in stages, not touching the original but saving modifications to successive generations), Shotwell stores all modifications in its database. These modifications represent operations to be performed on the original, not the modified photos (i.e. bitmaps) themselves. Thus, when a photo is loaded in Shotwell, it must be run though one or more pipeline operations before it’s ready to be displayed on the screen or exported to disk or network.
This strategy provides many advantages over a destructive photo editor. First, only one copy of the photo is stored on disk, saving disk space. Second, because the original is never touched, it’s easy to revert to the original (a kind of master Undo function). Certain operations can be tweaked by the user rather than completely undone and reattempted. (For example, Shotwell shows the entire photograph again when the user wants to update the crop.) Storing modified versions does allow the original to remain untouched, but each generation is stored in lossy JPEG format, meaning photo quality keeps degrading.
This strategy does have a major disadvantage. Because the transformations are stored as operations rather than bitmaps, the operations must be performed each time the photo is accessed. Speed is a constant concern, especially since some operations are quite expensive. Various optimizations have been added to Shotwell to alleviate this problem (including the thumbnail cache (which always stores its thumbnails as the fully-transformed version, so no operations must be performed on them) and readaheads via background threads.) However, even with these techniques in place, the pipeline is still a performance concern.
2. A Note on Scaling Photos
libjpeg offers an interesting optimization: If a decoded JPEG is requested at a scale exactly a multiple of one-half its size (1/2, 1/4, 1/8, etc.), libjpeg can skip decoding a number of cells, which results in a speed boost. LibRaw offers a similar optimization for RAW photos, but the only factor available is one-half. (PNG does not offer this optimization, as far as I can discern.)
This is much more efficient than doing an unscaled decode followed by a scaling operation (even over a low-quality scaling function such as NEAREST). GDK takes advantage of this optimization if the scale is specified for the load-and-decode operation. (If the scale is not exactly one-half, GDK will use the optimization then perform a manual scaling on the result, which is still faster than using an unscaled photo.) Additionally, scaled-down decoded pixbufs consume far less memory, which may not seem much of a concern in today’s multi-gigabyte desktop systems, but does exact a toll on L2 cache, as every byte is touched in many transformations.
Thus, one practice within Shotwell is important to realize: Never ask for a photo’s pixbuf in its unscaled size unless absolutely necessary. This is often required when exporting. It’s rarely (if ever) required when displaying on the screen, unless the photo’s dimensions are less than the screen’s size, in which case performance is probably not an issue anyway.
3. The Pipeline
When a photo is accessed via Photo.get_pixbuf , the caller supplies a Scaling object which specifies the dimensions of the returned pixbuf. For get_photo_with_exceptions (), an Exceptions bitwise flag is supplied which indicates which steps of the pipeline should be skipped. (This is important for many editing tools, which want to show the photo with their transformation left out.)
Note that these functions are thread-safe, allowing for a pixbuf to be loaded in the background. There is no provision for an asynchronous/non-blocking loading at this time.
Other future work is to refactor the pipeline so each stage is held in a separate class, for modularization. This would complement further work in editing tools to modularize the editing process.
3.1. 1. Load, Decode and Precache
In order to avoid the expense of loading and decoding an image file every time the pipeline needs to be rerun, the pipeline loads and decodes the image at its full size once, and saves the resulting data in a cache, and, on subsequent requests, uses the cached copy whenever possible. The cached version is discarded automatically after 180 seconds without being requested again to conserve memory.
The image data is loaded from a photo’s backing file, which may be one of three files: its master file (the original photo), a mimic file (which is a full-sized JPEG of the photo file, generated when decoding the master file is slow or expensive, i.e. RAW, or an editable file (which is a full-sized version of the photo with all transformations applied; it’s then made available to external editors for them to modify, thereby preserving the master).
After the cache has been populated, all subsequent pipeline steps operate on the cached copy.
3.2. 2. Red-eye reduction
Red-eye reductions are stored in the database in the unscaled, unrotated photo’s coordinate system. Because the image is not yet scaled or rotated at this point, the coordinates are applied to the image data at the unscaled, unspecified coordinates.
3.3. 3. Straightening
The straightening angle is stored in the database in degrees. The image is rotated to this angle, translated slightly so that all pixels would fall inside the quadrant containing +x, +y, and the result is stored in a pixbuf large enough to fit the entire rotated image.
3.4. 4. Crop
The crop region is stored in unscaled, but straightened coordinates; that is, the full size of the pixbuf created in the straightening step is taken into account, and the crop region is moved and reduced appropriately when the straightening angle changes to keep all of its corners inside the image. The crop region is applied to the pixbuf created by the straightening step.
3.5. 5. Scaling
If the requested scale is smaller than the full-sized image, the red-eyed, straightened, cropped image from above is scaled down to match the scale, and the result is saved in a pixbuf. This is done before color adjustment to possibly reduce the number of pixels that color arithmetic will have to be applied to.
3.6. 6. Color Adjustment
Each color adjustment is applied via the PhotoTransformer class, which coalesces the stored color adjustments, looks for optimizations, and applies the transformations in one loop. Scaled load-and-decode and color adjusment are the the most time-consuming operations in the pipeline. Color adjustment performs multiple floating-point operations on each pixel in the pixbuf, and will be the subject of further optimizations.
3.7. 7. Orientation
Finally, the pixbuf is rotated to match the stored orientation. Some cameras with accelerometers can detect the orientation of the camera and store an appropriate field in their EXIF data. Thus, even an untransformed photo in Shotwell may still have at least one transformation applied in real-time. Even a newly imported photo is not touched to fix-up orientation. This would result in quality loss due to re-encoding.
Note that if Shotwell is built with MEASURE_PIPELINE defined (use the configure script), Shotwell will log timing information on each step of the pipeline.