Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transactional handling of text files on Windows

I have multiple Windows programs (running on Windows 2000, XP and 7), which handle text files of different formats (csv, tsv, ini and xml). It is very important not to corrupt the content of these files during file IO. Every file should be safely accessible by multiple programs concurrently, and should be resistant to system crashes. This SO answer suggests using an in-process database, so I'm considering to use the Microsoft Jet Database Engine, which is able to handle delimited text files (csv, tsv), and supports transactions. I used Jet before, but I don't know whether Jet transactions really tolerate unexpected crashes or shutdowns in the commit phase, and I don't know what to do with non-delimited text files (ini, xml). I don't think it's a good idea to try to implement fully ACIDic file IO by hand.

What is the best way to implement transactional handling of text files on Windows? I have to be able to do this in both Delphi and C#.

Thank you for your help in advance.

EDIT

Let's see an example based on @SirRufo's idea. Forget about concurrency for a second, and let's concentrate on crash tolerance.

  1. I read the contents of a file into a data structure in order to modify some fields. When I'm in the process of writing the modified data back into the file, the system can crash.

  2. File corruption can be avoided if I never write the data back into the original file. This can be easily achieved by creating a new file, with a timestamp in the filename every time a modification is saved. But this is not enough: the original file will stay intact, but the newly written one may be corrupt.

  3. I can solve this by putting a "0" character after the timestamp, which would mean that the file hasn't been validated. I would end the writing process by a validation step: I would read the new file, compare its contents to the in-memory structure I'm trying to save, and if they are the same, then change the flag to "1". Each time the program has to read the file, it chooses the newest version by comparing the timestamps in the filename. Only the latest version must be kept, older versions can be deleted.

  4. Concurrency could be handled by waiting on a named mutex before reading or writing the file. When a program gains access to the file, it must start with checking the list of filenames. If it wants to read the file, it will read the newest version. On the other hand, writing can be started only if there is no version newer than the one read last time.

This is a rough, oversimplified, and inefficient approach, but it shows what I'm thinking about. Writing files is unsafe, but maybe there are simple tricks like the one above which can help to avoid file corruption.

UPDATE

Open-source solutions, written in Java:

  • Atomic File Transactions: article-1, article-2, source code
  • Java Atomic File Transaction (JAFT): project home
  • XADisk: tutorial, source code
  • AtomicFile: description, source code
like image 834
kol Avatar asked Dec 05 '12 23:12

kol


3 Answers

You are creating a nightmare for yourself trying to handle these transactions and states in your own code across multiple systems. This is why Larry Ellison (Oracle CEO) is a billionaire and most of us are not. If you absolutely must use files, then setup an Oracle or other database that supports LOB and CLOB objects. I store very large SVG files in such a table for my company so that we can add and render large maps to our systems without any code changes. The files can be pulled from the table and passed to your users in a buffer then returned to the database when they are done. Setup the appropriate security and record locking and your problem is solved.

like image 162
Jeff D. Avatar answered Oct 12 '22 23:10

Jeff D.


How about using NTFS file streams? Write multiple named(numbered/timestamped) streams to the same Filename. Every version could be stored in a different stream but is actually stored in the same "file" or bunch of files, preserving the data and providing a roll-back mechanism... when you reach a point of certainty delete some of the previous streams.

Introduced in NT 4? It covers all versions. Should be crash proof you will always have the previous version/stream plus the original to recover / roll-back to.

Just a late night thought.

http://msdn.microsoft.com/en-gb/library/windows/desktop/aa364404%28v=vs.85%29.aspx

like image 6
Despatcher Avatar answered Oct 28 '22 04:10

Despatcher


What you are asking for is transactionality, which is not possible without developing yourself the mechanism of a RDBMS database according to your requirements:

"It is very important not to corrupt the content of these files during file IO"

Pickup a DBMS.

like image 4
Jack G. Avatar answered Oct 28 '22 05:10

Jack G.