The purpose of my program is to open a text file of m lines of the same length n, read the file column by column and print each column.
For example, for this text file
abcd
efgh
jklm
I would like to print
a e j
b f k
c g l
d h m
As one line length can be 200 000 000 and the column length can be more than 10 000, I can't open all the file in memory in a matrix.
Theoretically, I would like to have a program that use O(m) in space and O(m*n) in time.
At the beginning, I had to think about these solutions:
Last point, for some server problems, I need to use only the STL.
My last idea is to create an array of iterators of a file and initialized these iterators at the beginning of each line. After that, to see the next column, I only need to increase each iterator. This is my code
ifstream str2;
str2.open ("Input/test.data", ifstream::in);
int nbline = 3;
int nbcolumn = 4;
int x = 0;
istreambuf_iterator<char> istart (str2);
istreambuf_iterator<char> iend ;
istreambuf_iterator<char>* iarray;
iarray = new istreambuf_iterator<char>[nbline];
while (istart != iend){
if (x % nbcolumn == 0){
iarray[x/nbcolumn] = istart;
}
istart++;
x++;
}
for (int j = 0; j<nbcolumn;j++){
for (int i = 0; i<nbline;i++){
cout << *iarray[i] << "\t";
iarray[i]++;
}
cout << endl;
}
Sadly, it does not work, and I have this thing as output
a e f
� � �
� � �
� � �
I think the problem is that the array of iterators iarray are not independent of istart, how I can do that?
You can break the task into chunks, then process each chunk before moving on to the next.
You'd need a buffer for each line (the larger this is the better the performance will be) and the seek position for that row. You may also need to make an initial pass thru the file to get the correct offsets for each row.
Read B bytes into the buffer for each row (using tellg
to save the position in each row), then loop over those and generate your output. Go back and read the next B bytes from each row (using seekg
to set the file position beforehand, and tellg
to remember it afterwards) and generate the output. Repeat until you're done, being careful with the last chunk (or with small inputs) to not go past the end of the line.
Using your example, you have 3 rows to keep track of. Using a B size of 2, you'd read in ab
, ef
, and jk
into your 3 buffers. Looping over those you'd output aej
and bfk
. Go back and read the next chunks: cd
, gh
, and lm
. This gives cgl
and dhm
as output.
I would do this like this:
IMO you can't do it better. Most critical will be how to select size of the square. Big power of 2 is recommended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With