Canonical
on 1 August 2017
This is a guest post by Thomi Richards. If you would like to contribute a guest post, please contact ubuntu-devices@canonical.com
One of the nicest features of snap packages is that it allows publishers to control which dependencies are shipped with their application: It’s endlessly frustrating to find that your users are experiencing broken software because some library that you depend upon was updated without your knowledge. Snap packages solve this, but at a price: They’re much larger files to download. This is an acceptable tradeoff for the initial installation, but quickly becomes tiresome when clients download updates for snaps they already have installed. In this post I’ll describe what we’ve done to minimise the amount of data that has to be downloaded when clients are updating from one revision of a snap to another.
Currently, systems with snaps installed periodically query the snap store for any available updates. For each snap the client has installed, (the snap package daemon) will send the following information:
- The snap id of the snap in question.
- The architecture of the client.
- The channel the client is following for this snap.
- The revision the client currently has installed.
The store then follows a reasonably simple process, it checks:
- Has the publisher released a newer revision for the channel & architecture the client asked about?
- Does the client have permission to access this snap? If the snap is private or non-free some additional authorisation checks are performed.
Finally, the store replies with a payload that includes a URL at which the new snap revision may be downloaded from. The client can now download and install the newer revision.
In the last few months we’ve rolled out a feature called “Download Deltas”. As the name suggests, this involves clients being able to download a binary delta between the revision of the snap they have installed and the revision they’re updating to. The end result is that snap updates need to download less data, and thus should be faster.
How are delta files generated?
We’ve deployed a new private service (named ‘snap-delta-service’) that’s responsible for building binary deltas between any two revisions of a snap. Every time a publisher releases a snap to a channel we ask snap-delta-service to generate a binary delta between the previously released snap revision and the newly released snap revision. For example, consider the following scenario:
I have a snap named what-snap (a useful command-line tool to make converting from snap names to snap ids and back again simple) with the following releases:
$ snapcraft status what-snap Track Arch Channel Version Revision latest amd64 stable 1.0 6 candidate 1.0 ^ beta 1.0 ^ edge 1.0 8
In case you’re unfamiliar with the snapcraft status output, this can be interpreted as: For the amd64 architecture (I’ve omitted armhf listing for clarity), I have revision 6 in the stable channel, and revision 8 in the edge channel. I’ve tested revision 8 and I want to release it to the stable channel:
$ snapcraft release what-snap 8 stable Track Arch Channel Version Revision latest amd64 stable 1 8 candidate 1 ^ beta 1 ^ edge 1 8
This will queue a binary delta from revision 6 to revision 8 for this snap. Once the delta has been generated, clients with a sufficiently new enough snapd will download the delta file instead of the full revision they’re updating to. This can be seen in the snapd logs (I’ve omitted some log messages to improve the clarity of this example):
Deltas enabled. Adding header X-Ubuntu-Delta-Formats: xdelta3 Available deltas returned by store: [{ 6 8 xdelta3 https://store/14SRjukNTZYlZOH0EBZ4Uq3OLChy7RMI_6_8_xdelta3.delta https://store/14SRjukNTZYlZOH0EBZ4Uq3OLChy7RMI_6_8_xdelta3.delta 1056245 77361810044ff3f552c582e8d…✄…bb2a01416079a2fd5c7f9887b58}] Successfully downloaded delta for "what-snap" at what-snap_8.snap.xdelta3-6-to-8.partial Successfully applied delta for "what-snap" at what-snap_8.snap.xdelta3-6-to-8.partial, saving 1114635 bytes.
There’s a few interesting points from this log:
- On the first line you can see that clients advertise which delta formats they understand. Today snapd understands the xdelta3 format.
- The second line shows us that the store found a delta suitable for this update. The data that’s logged is the source revision, target revision, delta format, anonymous download url, authenticated download url, delta filesize, delta sha3_384.
- The rest of the log shows that the delta file was downloaded and applied successfully. We also log how much data was saved by using a delta file. In this particular case the saving was minimal, but this is a really small snap.
It’s important to note that after this process snapd verifies that the assertions about this snap package still apply (‘assertions’ in the Ubuntu Core vernacular are cryptographically signed documents stating facts about a snap). This means that the result of applying a delta must be bit-for-bit identical with the full target snap.
What bandwidth savings can I expect?
The exact amount of data saved depends on a number of factors, including how much has changed between the source and target revisions and the language used to build the snap. The table below shows a range of different update scenarios across several different snaps:
snap name | source revision | target revision | target snap size | delta size | bytes saved | delta %ge |
---|---|---|---|---|---|---|
docker | 88 | 102 | 37.3 MB | 28.9 MB | 8.4 MB | 77% |
nextcloud | 1337 | 1474 | 186.9 MB | 102.3 MB | 84.6 MB | 55% |
core | 1577 | 1689 | 83.3 MB | 23.1 MB | 60.3 MB | 28% |
vectr | 1 | 2 | 95.6 MB | 16.4 MB | 79.2 MB | 17% |
rocketchat-server | 707 | 709 | 169.3 MB | 1.0 MB | 168.3 MB | 1% |
A smaller value in the ‘delta %ge’ column means better bandwidth savings (for example, ‘17%’ means the client only needs to download 17% of the full snap data). As you can see, the actual savings vary quite a bit. The best way to ensure updates are as small as possible is to release as frequently as you can: More frequent updates translate into smaller deltas. For the same reason, clients on the edge channel will typically gain more from download deltas than clients on the stable channel.
Where do we go from here?
Download deltas have been switched on in production for a while now. We’re confident that they’re saving us bandwidth, but there are several things we can do to improve them.
The xdelta3 algorithm we’re using does not always perform particularly well – particularly for languages that build a single large binary (like go). There are other tools we could use that would result in better compression in these cases, but we need to do more investigation and engineering before we can implement these.
Currently we decide which deltas to generate at snap-release-time, and we only generate a delta from the latest-but-one to the latest snap for each channel. However, sometimes there are clients still on older revisions of the snap that will therefore not get a delta when they update to the latest released revision. In these cases it’d be nice if we could react to that situation and generate deltas for those clients as well. Another possibility is that we could send clients a chain of delta files that will get them from their current revision to the latest revision, but in most cases this will cease to be less expensive than just downloading the latest revision, and may also get expensive to calculate.
Finally, we need to keep track of the ‘delta hit/miss rate’ – what percentage of clients with pending updates were able to update themselves using a delta? We obviously want to maximise that number, but as people adopt Ubuntu Core we may well need to adjust our delta generation algorithm to keep on top of client use patterns. I’ll keep an eye on the metrics we collect, and will post an analysis once I have something interesting to talk about.