Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ slow read/seekg

Tags:

c++

file

seekg

In my program I read in a file (here only a test file of about 200k data points afterwards there will be millions.) Now what I do is:

for (int i=0;i<n;i++) {
    fid.seekg(4,ios_base::cur);
    fid.read((char*) &x[i],8);
    fid.seekg(8,ios_base::cur);
    fid.read((char*) &y[i],8);
    fid.seekg(8,ios_base::cur);
    fid.read((char*) &z[i],8);
    fid.read((char*) &d[i],8);
    d[i] = (d[i] - p)/p;
    z[i] *= cc;
}

Whereby n denotes the number of points to read in.

Afterwards I write them again with

for(int i=0;i<n;i++){
        fid.write((char*) &d[i],8);
        fid.write((char*) &z[i],8);

        temp = (d[i] + 1) * p;
        fid.write((char*) &temp,8);
    }

Whereby the writing is faster then the reading.(time measured with clock_t)

My Question is now. Have I done some rather stupid mistake with the reading or can this behavior be expected?

I'm using Win XP with a magnetic drive.

yours magu_

like image 497
magu_ Avatar asked Dec 01 '22 19:12

magu_


2 Answers

You're using seekg too often. I see that you're using it to skip bytes, but you could as well read the complete buffer and then skip the bytes in the buffer:

char buffer[52];

for (int i=0;i<n;i++) {
    fid.read(buffer, sizeof(buffer));
    memcpy(&x[i], &buffer[4], sizeof(x[i]));
    memcpy(&y[i], &buffer[20], sizeof(y[i]));
    // etc
}

However, you can define a struct that represents the data in your file:

#pragma pack(push, 1)
struct Item
{
    char dummy1[4]; // skip 4 bytes
    __int64 x;
    char dummy2[8]; // skip 8 bytes
    __int64 y;
    char dummy3[8]; // skip 8 bytes
    __int64 z;
    __int64 d;
};
#pragma pack(pop)

then declare an array of those structs and read all data at once:

Item* items = new Item[n];
fid.read(items, n * sizeof(Item)); // read all data at once will be amazing fast

(remark: I don't know the types of x, y, z and d, so I assume __int64 here)

like image 74
huysentruitw Avatar answered Dec 12 '22 02:12

huysentruitw


I personally would (at least) do this:

for (int i=0;i<n;i++) {
    char dummy[8];
    fid.read(dummy,4);
    fid.read((char*) &x[i],8);
    fid.read(dummy,8);
    fid.read((char*) &y[i],8);
    fid.read(dummy,8);
    fid.read((char*) &z[i],8);
    fid.read((char*) &d[i],8);
    d[i] = (d[i] - p)/p;
    z[i] *= cc;
}

Doing a struct, or reading large amounts of data in one go (say adding a second layer, where you read 4KB at a time, and then using a pair of functions that do "skip" and "fetch" of the different fields would be a bit more work, but likely much faster).

Another option is to use mmap in Linux or MapViewOfFile in Windows. This method reduces the overhead in reading a file by a small portion, since there is one less copy required to transfer the data to the application.

Edit: I should add "Make sure you make comparative measurements", and if your application is meant to run on many machines, make sure you make measurements on more than one type of machine, with different alternatives of disk drive, processor and memory. You don't really want to tweak the code so that it runs 50% faster on your machine, but 25% slower on another machine.

like image 34
Mats Petersson Avatar answered Dec 12 '22 03:12

Mats Petersson