I'm developing an updater for a game client, so that the players won't have to download the whole client when it gets updated.
Now, creating a standard updater isn't really hard, but it's quite slow with large files.
The client is about 1,5 GB uncompressed and has ~250 files. The files on the update server are gzip-compressed and get downloaded via HTTP.
The updater works like this: get patchlist from server -> compare files from patchlist with local files (crc32 / filesize) -> if missing/wrong filesize/hash is not the same -> download gzip compressed file from server -> decompress file
The most time consuming parts of the updater: generating crc32 hashes for every file / downloading big files
I've thought of some things that could speed it up:
Rsync-like diff updater - This would speed up the download, because it would only get the different part of the file and not just download the whole file. This would be helpful because usually a client update doesn't affect many parts of big files. But I guess it would be some kind of overkill for this purpose.
Better compression - Gzip saves about ~200 MB when the client gets compressed. I haven't tried using some other compression methods, but I guess bzip2, lzma or else would save more space and speed up downloads. Ironically they would slow down the decompression of the files.
Other file-check method - At the moment I'm using a C# crc32 implementation, because it was faster than standard c# md5 implementation. Are there any faster algorithms which can tell if a file is the same?
Version system - It wouldn't actually speed up anything, but the updater wouldn't have to calculate all hashes. And with an additional "repair"-function it could just check all files against the actual version, if the user wants to.
Which of these solutions should I be using, or are there any approaches that I haven't listed that I should be using instead?
Rather than downloading the entire package, you can download only the files that are new or changed.
By pre-calculating your hashes, you can save a lot of time. Your hash comparison step becomes a diff of a single file that stores hashes for all of your files. This is functionally the same as a versioning system, but the "versions" are a little bit harder to fool. It's easy for a user to open up a plain text file of versions and set the numbers to the next version to skip patches. If you want to prevent this sort of behavior, hashes are slightly more secure.
After performing the diff of your hash file, you can send your request to the server for the list of files that need to be downloaded. Your downloader can then stream each file in succession, and as they are received, additional threads can unzip and move the downloaded files.
I've done this in the past and it really depends on your specific implementation and desired options. Some things we did:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With