Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ensure that data doesn't get corrupted when saving to file?

I am relatively new to C# so please bear with me.

I am writing a business application (in C#, .NET 4) that needs to be reliable. Data will be stored in files. Files will be modified (rewritten) regularly, thus I am afraid that something could go wrong (power loss, application gets killed, system freezes, ...) while saving data which would (I think) result in a corrupted file. I know that data which wasn't saved is lost, but I must not lose data which was already saved (because of corruption or ...).

My idea is to have 2 versions of every file and each time rewrite the oldest file. Then in case of unexpected end of my application at least one file should still be valid.

Is this a good approach? Is there anything else I could do? (Database is not an option)

Thank you for your time and answers.

like image 281
Ben Avatar asked Oct 31 '11 17:10

Ben


People also ask

What causes corrupted save data?

Corrupted save files can occur as a result of a connectivity issue or power loss when the game save file is being created or synced with the cloud. If you are receiving a corrupt save file error, it is important to delete the file in order to avoid additional issues.

What is the most common cause of file corruption?

There are dozens of reasons why your Windows files or system files might become corrupted, but among the most common are: Sudden power outage. Power surge. Complete system crash.


1 Answers

Rather than "always write to the oldest" you can use the "safe file write" technique of:

(Assuming you want to end up saving data to foo.data, and a file with that name contains the previous valid version.)

  • Write new data to foo.data.new
  • Rename foo.data to foo.data.old
  • Rename foo.data.new to foo.data
  • Delete foo.data.old

At any one time you've always got at least one valid file, and you can tell which is the one to read just from the filename. This is assuming your file system treats rename and delete operations atomically, of course.

  • If foo.data and foo.data.new exist, load foo.data; foo.data.new may be broken (e.g. power off during write)
  • If foo.data.old and foo.data.new exist, both should be valid, but something died very shortly afterwards - you may want to load the foo.data.old version anyway
  • If foo.data and foo.data.old exist, then foo.data should be fine, but again something went wrong, or possibly the file couldn't be deleted.

Alternatively, simply always write to a new file, including some sort of monotonically increasing counter - that way you'll never lose any data due to bad writes. The best approach depends on what you're writing though.

You could also use File.Replace for this, which basically performs the last three steps for you. (Pass in null for the backup name if you don't want to keep a backup.)

like image 178
Jon Skeet Avatar answered Oct 26 '22 23:10

Jon Skeet