Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C write in the middle of a binary file without overwriting any existing content

Today's problem is that I need to write an array of numbers in a binary file at a starting position. I have the position where it should start, and I don't want to overwrite values after that, just want to insert the array at the starting position in the file. E.g:

12345

Let's push 456 at position 2:

12456345

I know that probably I'll have to implement it by myself, but I want to know what's your opinion on how to implement that as efficiently as possible.

like image 758
Frederico Schardong Avatar asked May 06 '12 02:05

Frederico Schardong


People also ask

How to write in a file without overwriting C?

Yes, it is possible. By opening it with r+ you open it for 'reading and writing' (while w opens it for writing freshly). t -> Opens the file as text, specifically parsing \n as \r\n under Windows. That's better.


2 Answers

Here's a function extend_file_and_insert() that does the job, more or less.

#include <sys/stat.h>
#include <unistd.h>

enum { BUFFERSIZE = 64 * 1024 };

#define MIN(x, y) (((x) < (y)) ? (x) : (y))

/*
off_t   is signed
ssize_t is signed
size_t  is unsigned

off_t   for lseek() offset and return
size_t  for read()/write() length
ssize_t for read()/write() return
off_t   for st_size
*/

static int extend_file_and_insert(int fd, off_t offset, char const *insert, size_t inslen)
{
    char buffer[BUFFERSIZE];
    struct stat sb;
    int rc = -1;

    if (fstat(fd, &sb) == 0)
    {
        if (sb.st_size > offset)
        {
            /* Move data after offset up by inslen bytes */
            size_t bytes_to_move = sb.st_size - offset;
            off_t read_end_offset = sb.st_size; 
            while (bytes_to_move != 0)
            {
                ssize_t bytes_this_time = MIN(BUFFERSIZE, bytes_to_move);
                ssize_t rd_off = read_end_offset - bytes_this_time;
                ssize_t wr_off = rd_off + inslen;
                lseek(fd, rd_off, SEEK_SET);
                if (read(fd, buffer, bytes_this_time) != bytes_this_time)
                    return -1;
                lseek(fd, wr_off, SEEK_SET);
                if (write(fd, buffer, bytes_this_time) != bytes_this_time)
                    return -1;
                bytes_to_move -= bytes_this_time;
                read_end_offset -= bytes_this_time; /* Added 2013-07-19 */
            }   
        }   
        lseek(fd, offset, SEEK_SET);
        write(fd, insert, inslen);
        rc = 0;
    }   
    return rc;
}

(Note the additional line added 2013-07-19; it was a bug that only shows when the buffer size is smaller than the amount of data to be copied up the file. Thanks to malat for pointing out the error. Code now tested with BUFFERSIZE = 4.)

This is some small-scale test code:

#include <fcntl.h>
#include <string.h>

static const char base_data[] = "12345";
typedef struct Data
{
    off_t       posn;
    const char *data;
} Data;
static const Data insert[] =
{
    {  2, "456"                       },
    {  4, "XxxxxxX"                   },
    { 12, "ZzzzzzzzzzzzzzzzzzzzzzzzX" },
    { 22, "YyyyyyyyyyyyyyyY"          },
};  
enum { NUM_INSERT = sizeof(insert) / sizeof(insert[0]) };

int main(void)
{
    int fd = open("test.dat", O_RDWR | O_TRUNC | O_CREAT, 0644);
    if (fd > 0)
    {
        ssize_t base_len = sizeof(base_data) - 1;
        if (write(fd, base_data, base_len) == base_len)
        {
            for (int i = 0; i < NUM_INSERT; i++)
            {
                off_t length = strlen(insert[i].data);
                if (extend_file_and_insert(fd, insert[i].posn, insert[i].data, length) != 0)
                    break;
                lseek(fd, 0, SEEK_SET);
                char buffer[BUFFERSIZE];
                ssize_t nbytes;
                while ((nbytes = read(fd, buffer, sizeof(buffer))) > 0)
                    write(1, buffer, nbytes);
                write(1, "\n", 1);
            }
        }
        close(fd);
    }
    return(0);
}

It produces the output:

12456345
1245XxxxxxX6345
1245XxxxxxX6ZzzzzzzzzzzzzzzzzzzzzzzzZ345
1245XxxxxxX6ZzzzzzzzzzYyyyyyyyyyyyyyyYzzzzzzzzzzzzzzZ345

It should be tested on some larger files (ones bigger than BUFFERSIZE, but it would be sensible to test with a BUFFERSIZE a lot smaller than 64 KiB; I used 32 bytes and it seemed to be OK). I've only eyeballed the results but the patterns are designed to make it easy to see that they are correct. The code does not check any of the lseek() calls; that's a minor risk.

like image 106
Jonathan Leffler Avatar answered Oct 27 '22 05:10

Jonathan Leffler


First, use ftruncate() to enlarge the file to the final size. Then copy everything from the old end over to the new end, working your way back to the insertion point. Then overwrite the middle contents with the data you want to insert. This is as efficient as it gets, I think, because filesystems don't generally offer true "insertion" in the middle of files.

like image 29
John Zwinck Avatar answered Oct 27 '22 03:10

John Zwinck