Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Atomic file modification

Tags:

java

unix

windows

There is region in file(possible small) that I want to overwrite. Assume I calling fseek, fwrite, fsync. Is there any way to ensure atomicity of such region-rewriting operation, e.g. i need to be sure, that in any case of failure the region will contains only old(before modification) data, or only new(modified) data, but not a mix of this.

There are two thing i want to highlight.

First: It's ok if there is no way to atomically write ANY size region - we can handle it by appending data to the file, fsync'ing, and then rewriting 'pointer' area in file, then fsyncing again. However, if 'pointer' writing is not atomic, we still can have corrupted file with illegal pointers.

Second: I am pretty sure, writing 1-byte regions is atomic: i will not see in file any bytes I never put there. So we can use some tricks with allocating two regions for addresses and use 1-byte switch, so rewriting of region became - append new data, syncing, rewrite one of two(unused) pointer slots, syncing again, and then rewrite 'switch byte' and again syncing. So the overwrite region operation now contains at least 3 fsync invocation.

All of this would be much easer, if I will have atomic writing for longs, but do i really have it?

Is there any way to handle this situation without using method, mentioned in point 2?

Another question is - is there any ordering guarantee between writing and syncing? For example, if i call fseek, fwrite [1], fseek, fwrite [2], fsync, can i have writing at [2] commited, and writing at [1] - not commited?

This question is applicable to linux and windows operation system, any particular answer(e.g. in ubuntu version a.b.c ....) is also wanted.

like image 534
andll Avatar asked Sep 04 '12 17:09

andll


People also ask

What is atomic file update?

An atomic operation is one that changes a system from one state to another without visibly passing through any intermediate states. Atomicity is desirable when altering the content of a file because: The process performing the alteration may fail or be stopped, leaving the file in an incomplete or inconsistent state.

Is Fsync Atomic?

Sadly, fsync() is not atomic itself.

What is an atomic file operation?

Atomic Operations Several Files methods, such as move , can perform certain operations atomically in some file systems. An atomic file operation is an operation that cannot be interrupted or "partially" performed. Either the entire operation is performed or the operation fails.

Is writing to a file atomic?

Atomic in general means the operation cannot be interrupted will complete or have no effect. When writing files, that is accomplished by writing to a temporary file then replacing the original with the temporary when the write completes.


1 Answers

It's usually safe to assume that writing a 512 bytes chunks are done in one write by the HDDs. However, i would not assume that. Instead, i'd go with your second solution, while adding a checksum to your write and verifying it before changing the pointer in the file.

Generally, it's a good practice to add checksum to everything written to disk.

To answer about "sync" guarantee - you can assume that. While sync is FS and disk dependent, let's say we are talking about 'reasonable' implementation.

  • After the 1st sync the data is guaranteed to be flushed to the disk (the disk might have it in it's cache still) and if the data you are expected to get whatever you wrote.
  • If after the second sync the data of both syncs is in the disk cache, the situation you described can happen, but IMHO the probability of that is very low.

Anyway, there's no other mechanism which will promise you data is on disk. That's why you must have checksums.

Some more info: Ensure fsync did its job

like image 195
Drakosha Avatar answered Sep 24 '22 21:09

Drakosha