Suppose I have 2 files with size of 100G each. And I want to merge them into one, and then delete them. In linux we can use
cat file1 file2 > final_file
But that needs to read 2 big files, and then write a bigger file. Is it possible just append one file to the other, so that no IO is required? Since metadata of file contains the location of the file, and the length, I am wondering whether it is possible to change the metadata of the file to do the merge, so no IO will happen.
To choose the merge option, click the arrow next to the Merge button and select the desired merge option. Once complete, the files are merged. If there are multiple files you want to merge at once, you can select multiple files by holding down the Ctrl and selecting each file you want to merge.
Only in obscure theory. Since disk storage is always based on blocks and filesystems therefore store things on block boundaries, you could only append one file to another without rewriting if the first file ended perfectly on a block boundary. There are some rare filesystem configurations that use tail packing, but that would only help if the first file where already using the tail block of the previous file.
Unless that perfect scenario occurs or your filesystem is able to mark a partial block in the middle of the file (I've never heard of this), this won't work. Just to kick the edge case around, there's also no way outside of changing the kernel interace to make such a call (re: Link to a specific inode)
Yes, we can use the append (>>
) operation instead.
cat file2 >> file1
That will still result in using all the space of consumed by file2
twice over until we can delete it.
No. Unless somebody comes back with something I don't know, you're basically out of luck there. It's possible to truncate a file, forgetting about the existence of the end of it, but there is no way to forget about the existence of the start unless we get back to modifying inodes directly and having to alter the kernel interface to the filesystem since that's definitely not a a POSIX operation.
No again. Since we can't chop the start of a file off, we'd have to rewrite everything from the point of interest all the way to the end of the file. This would be very costly for IO and only useful after we've already read half the file.
Maybe! Sparse file allow us to store a long string of zeroes without using up nearly that much space. If we were to read file2
in large chunks starting at the end, we could write those blocks to the end of file1
. file1
would immediately look (and read) as if it were the same size as both, but it would be corrupted until we were done because everything we hadn't written would be full of zeroes.
Explaining all this is another answer in itself, but if you can do a spare allocation, you would be able to use only your chunk read size + a little bit extra in disk space to perform this operation. For a reference talking about sparse blocks in the middle of files, see http://lwn.net/Articles/357767/ or do a search involving the term, SEEK_HOLE
.
Why is this "maybe" instead of "yes"? Two parts: you'd have to write your own tool (at least we're on the right site for that), and sparse files are not universally respected by file systems and other processes alike. Fortunately you probably won't have to worry about other processes respecting your file, but you will have to worry about setting the right flags and making sure your filesystem is amenable. Last of all, you'll still be reading and re-writing the length of file2
, which isn't what you want. This method does mean you can append with just a small amount of disk space, though, rather at using at least 2*file2
amount of space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With