In my program I read in a file (here only a test file of about 200k data points afterwards there will be millions.) Now what I do is:
for (int i=0;i<n;i++) {
fid.seekg(4,ios_base::cur);
fid.read((char*) &x[i],8);
fid.seekg(8,ios_base::cur);
fid.read((char*) &y[i],8);
fid.seekg(8,ios_base::cur);
fid.read((char*) &z[i],8);
fid.read((char*) &d[i],8);
d[i] = (d[i] - p)/p;
z[i] *= cc;
}
Whereby n denotes the number of points to read in.
Afterwards I write them again with
for(int i=0;i<n;i++){
fid.write((char*) &d[i],8);
fid.write((char*) &z[i],8);
temp = (d[i] + 1) * p;
fid.write((char*) &temp,8);
}
Whereby the writing is faster then the reading.(time measured with clock_t)
My Question is now. Have I done some rather stupid mistake with the reading or can this behavior be expected?
I'm using Win XP with a magnetic drive.
yours magu_
You're using seekg
too often. I see that you're using it to skip bytes, but you could as well read the complete buffer and then skip the bytes in the buffer:
char buffer[52];
for (int i=0;i<n;i++) {
fid.read(buffer, sizeof(buffer));
memcpy(&x[i], &buffer[4], sizeof(x[i]));
memcpy(&y[i], &buffer[20], sizeof(y[i]));
// etc
}
However, you can define a struct that represents the data in your file:
#pragma pack(push, 1)
struct Item
{
char dummy1[4]; // skip 4 bytes
__int64 x;
char dummy2[8]; // skip 8 bytes
__int64 y;
char dummy3[8]; // skip 8 bytes
__int64 z;
__int64 d;
};
#pragma pack(pop)
then declare an array of those structs and read all data at once:
Item* items = new Item[n];
fid.read(items, n * sizeof(Item)); // read all data at once will be amazing fast
(remark: I don't know the types of x
, y
, z
and d
, so I assume __int64
here)
I personally would (at least) do this:
for (int i=0;i<n;i++) {
char dummy[8];
fid.read(dummy,4);
fid.read((char*) &x[i],8);
fid.read(dummy,8);
fid.read((char*) &y[i],8);
fid.read(dummy,8);
fid.read((char*) &z[i],8);
fid.read((char*) &d[i],8);
d[i] = (d[i] - p)/p;
z[i] *= cc;
}
Doing a struct, or reading large amounts of data in one go (say adding a second layer, where you read 4KB at a time, and then using a pair of functions that do "skip" and "fetch" of the different fields would be a bit more work, but likely much faster).
Another option is to use mmap
in Linux or MapViewOfFile
in Windows. This method reduces the overhead in reading a file by a small portion, since there is one less copy required to transfer the data to the application.
Edit: I should add "Make sure you make comparative measurements", and if your application is meant to run on many machines, make sure you make measurements on more than one type of machine, with different alternatives of disk drive, processor and memory. You don't really want to tweak the code so that it runs 50% faster on your machine, but 25% slower on another machine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With