I'm writing a back up solution (of sorts). Simply it copies a file from location C:\ and pastes it to location Z:\
To ensure the speed is fast, before copying and pasting it checks to see if the original file exists. If it does, it performs a few 'calculations' to work out if the copy should continue or if the backup file is up to date. It is these calculations I'm finding difficult.
Originally, I compared the file size but this is not good enough because it would be very possible to change a file and it to be the same size (for example saving the character C in notepad is the same size as if I saved the Character T).
So, I need to find out if the modified date differs. At the moment, I get the file info using the FileInfo
class but after reviewing all the fields there is nothing which appears to be suitable.
How can I check to ensure that I'm copying files which have been modified?
EDIT I have seen suggestions on SO to use MD5 checksums, but I'm concerned this may be a problem as some of the files I'm comparing will be up to 10GB
The syntax is pretty simple; just run the stat command followed by the file's name whose last modification date you want to know, as shown in the example below.
Modified File : A media file has been changed by an application other than Final Cut Pro.
Going by modified date will be unreliable - the computer clock can go backwards when it synchronizes, or when manually adjusted. Some programs might not behave well when modifying or copying files in terms of managing the modified date.
Going by the archive bit might work in a controlled environment but what happens if another piece of software is running that uses the archive bit as well?
The Windows archive bit is evil and must be stopped
If you want (almost) complete reliability then what you should do is store a hash value of the last backed up version using a good hashing function like SHA1, and if the hash value changes then you upload the new copy.
Here is the SHA1 class along with a code sample on the bottom:
http://msdn.microsoft.com/en-us/library/system.security.cryptography.sha1.aspx
Just run the file bytes through it and store the hash value. Pass a FileStream
to it instead of loading your file into memory with a byte array to reduce memory usage, especially for large files.
You can combine this with modified date in various ways to tweak your program as needed for speed and reliability. For example, you can check modified dates for most backups and periodically run a hash checker that runs while the system is idle to make sure nothing got missed. Sometimes the modified date will change but the file contents are still the same (i.e. got overwritten with the same data), in which case you can avoid resending the whole file after you recompute the hash and realize it is still the same.
Most version control systems use some kind of combined approach with hashes and modified dates.
Your approach will generally involve some kind of risk management with a compromise between performance and reliability if you don't want to do a full backup and send all the data over each time. It's important to do "full backups" once in a while for this reason.
You can compare files by their hashes:
private byte[] GetFileHash(string fileName)
{
HashAlgorithm sha1 = HashAlgorithm.Create();
using(FileStream stream = new FileStream(fileName,FileMode.Open,FileAccess.Read))
return sha1.ComputeHash(stream);
}
If content was changed, hashes will be different.
You may like to check out the FileSystemWatcher class.
"This class lets you monitor a directory for changes and will fire an event when something is modified."
Your code can then handle the event and process the file.
Code source - MSDN:
// Create a new FileSystemWatcher and set its properties.
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];
/* Watch for changes in LastAccess and LastWrite times, and
the renaming of files or directories. */
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
| NotifyFilters.FileName | NotifyFilters.DirectoryName;
// Only watch text files.
watcher.Filter = "*.txt";
// Add event handlers.
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With