Fast and efficient updater

Question

I'm developing an updater for a game client, so that the players won't have to download the whole client when it gets updated.

Now, creating a standard updater isn't really hard, but it's quite slow with large files.

The client is about 1,5 GB uncompressed and has ~250 files. The files on the update server are gzip-compressed and get downloaded via HTTP.

The updater works like this: get patchlist from server -> compare files from patchlist with local files (crc32 / filesize) -> if missing/wrong filesize/hash is not the same -> download gzip compressed file from server -> decompress file

The most time consuming parts of the updater: generating crc32 hashes for every file / downloading big files

I've thought of some things that could speed it up:

Rsync-like diff updater - This would speed up the download, because it would only get the different part of the file and not just download the whole file. This would be helpful because usually a client update doesn't affect many parts of big files. But I guess it would be some kind of overkill for this purpose.
Better compression - Gzip saves about ~200 MB when the client gets compressed. I haven't tried using some other compression methods, but I guess bzip2, lzma or else would save more space and speed up downloads. Ironically they would slow down the decompression of the files.
Other file-check method - At the moment I'm using a C# crc32 implementation, because it was faster than standard c# md5 implementation. Are there any faster algorithms which can tell if a file is the same?
Version system - It wouldn't actually speed up anything, but the updater wouldn't have to calculate all hashes. And with an additional "repair"-function it could just check all files against the actual version, if the user wants to.

Which of these solutions should I be using, or are there any approaches that I haven't listed that I should be using instead?

Luke Willis · Accepted Answer

Compress Individual Files

Rather than downloading the entire package, you can download only the files that are new or changed.

Store Hashes on both Client and Server

By pre-calculating your hashes, you can save a lot of time. Your hash comparison step becomes a diff of a single file that stores hashes for all of your files. This is functionally the same as a versioning system, but the "versions" are a little bit harder to fool. It's easy for a user to open up a plain text file of versions and set the numbers to the next version to skip patches. If you want to prevent this sort of behavior, hashes are slightly more secure.

Parallelize

After performing the diff of your hash file, you can send your request to the server for the list of files that need to be downloaded. Your downloader can then stream each file in succession, and as they are received, additional threads can unzip and move the downloaded files.

wilso132 · Answer

I've done this in the past and it really depends on your specific implementation and desired options. Some things we did:

Allow the user to choose the preference of a "full" scan for updating every time. Unless there's some security concern, there's zero reason for you to compare each and every hash.
On a full scan, perform your hashes first to either a plain text document or an XML document if there's a need. I've found the XML can be helpful for matching against the server for only retrieving specific files that are necessary.
If this would work for you, you can always perform hashes on just the "names" of files and not the contents. What do I mean? Hash a Directory.GetFiles() result versus the expected result to see if you need to dive any further for missing files. This obviously will not work if you're concerned about file tampering and actually need the hash of the contents.

Fast and efficient updater

Tags:

performance

c#

Spartan-117

2 Answers

Compress Individual Files

Store Hashes on both Client and Server

Parallelize

Luke Willis

wilso132

Recent Activity

Donate For Us

Fast and efficient updater

Tags:

performance

c#

Spartan-117

2 Answers

Compress Individual Files

Store Hashes on both Client and Server

Parallelize

Luke Willis

wilso132

Related questions

Recent Activity

Donate For Us