Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What filesystems allow repositioning the beginning of a file?

Typical filesystems, and the POSIX interface, only allow a file to be resized at the end. Typically the size of a file "on disk" after it has been closed is equal to the offset of the read/write position when it was closed. Seeking before closing is also known as "repositioning the end-of-file."

A file that contains a queue of data would be more efficiently represented by an operation to remove the beginning of the file. The on-disk allocation blocks at the beginning could be freed, and needless copying minimized.

Is this directly supported by any common filesystem format and/or operating system? What kind of interface is used to do so? (For example, a Linux fcntl selector.) I'm pretty sure I've heard of this kind of thing in practice.

like image 315
Potatoswatter Avatar asked Aug 07 '13 08:08

Potatoswatter


2 Answers

No. Not in the Unix world, at any rate.

If you look inside DBMS or Unix(ish) file system internals, they can easily truncate or extend datasets at the beginning, at the end, or anywhere in the middle. But they don't export that functionality; it's not part of the Unix API heritage or the POSIX standard, so any "truncate at beginning" or "extend at beginning" APIs would be non-standard ("proprietary").

The marginal utility of such functions is also unclear. Who would use them? How often?

Unix files (flat sequences of bytes/characters) have proven themselves simple and effective for application code, but a poor foundation for layered data structures. Twenty five years ago that statement was debatable; today it's just an observed historical reality.

Unix developers used to argue "all things can be reduced to files" and "we can ace random access through clever seeking." Those claims never quite worked out, however. Unix never, for example, matched the random-access record management prowess of minicomputer and mainframe operating systems (e.g. DEC RMS, IBM ISAM and VSAM). And while those implementing file systems, queues, tries, relational databases, and object stores do occasionally drop contents into files, and they use files for sequential operations like logging, but they rarely depend on character streams as their low-level format. Instead they use structures like B-trees and hash tables to directly manage disk blocks, memory segments, and other underlying resources.

Character streams belong with tables, documents, and objects--abstractions suitable for client applications. If you want a queue, consider using existing middleware (e.g. RabbitMQ, ZeroMQ, REDIS, some DBMS manager) that already has this covered. If you must build it yourself, you'd probably wouldn't build it atop a simplistic character stream abstraction. So while truncate/extend at beginning is potentially useful for some things (log trimming instead of segmented log rotation, e.g.), it's unlikely to be a Big Win for most data structure implementations.

like image 58
Jonathan Eunice Avatar answered Nov 07 '22 06:11

Jonathan Eunice


Actually, Linux does have an interface that does what you are requesting. It is the FALLOC_FL_COLLAPSE_RANGE flag to fallocate. It is supported by btrfs, ext4, and xfs (possibly others) on modern kernels. Although the fallocate interface allows you to specify byte offsets and lengths, in practice the COLLAPSE_RANGE call will only work if the offset and the length is a multiple of the file system block size.

For more information please see the Fine Manual for the fallocate(2) system call: http://man7.org/linux/man-pages/man2/fallocate.2.html

like image 32
Theodore Ts'o Avatar answered Nov 07 '22 05:11

Theodore Ts'o