Multi stream iterators c++

Question

The purpose of my program is to open a text file of m lines of the same length n, read the file column by column and print each column.

For example, for this text file

abcd
efgh 
jklm

I would like to print

a e j
b f k
c g l
d h m

As one line length can be 200 000 000 and the column length can be more than 10 000, I can't open all the file in memory in a matrix.

Theoretically, I would like to have a program that use O(m) in space and O(m*n) in time.

At the beginning, I had to think about these solutions:

if I see all the file for each column the complexity is O(m*n²),
If I use seekg and an array of positions and jump from position to position, the complexity is O(mnlog(n)).

Last point, for some server problems, I need to use only the STL.

My last idea is to create an array of iterators of a file and initialized these iterators at the beginning of each line. After that, to see the next column, I only need to increase each iterator. This is my code

ifstream str2;
str2.open ("Input/test.data", ifstream::in);

int nbline = 3;
int nbcolumn = 4;
int x = 0;

istreambuf_iterator<char> istart (str2);
istreambuf_iterator<char> iend ;

istreambuf_iterator<char>* iarray;
iarray = new istreambuf_iterator<char>[nbline];


while (istart != iend){
    if (x % nbcolumn == 0){
        iarray[x/nbcolumn] = istart;
    }
    istart++;
    x++;
}

for (int j = 0; j<nbcolumn;j++){
    for (int i = 0; i<nbline;i++){
        cout  << *iarray[i] << "	";
        iarray[i]++;
    }
    cout << endl;
}

Sadly, it does not work, and I have this thing as output

a       e       f       
�       �       �       
�       �       �       
�       �       �

I think the problem is that the array of iterators iarray are not independent of istart, how I can do that?

1201ProgramAlarm · Accepted Answer

You can break the task into chunks, then process each chunk before moving on to the next.

You'd need a buffer for each line (the larger this is the better the performance will be) and the seek position for that row. You may also need to make an initial pass thru the file to get the correct offsets for each row.

Read B bytes into the buffer for each row (using tellg to save the position in each row), then loop over those and generate your output. Go back and read the next B bytes from each row (using seekg to set the file position beforehand, and tellg to remember it afterwards) and generate the output. Repeat until you're done, being careful with the last chunk (or with small inputs) to not go past the end of the line.

Using your example, you have 3 rows to keep track of. Using a B size of 2, you'd read in ab, ef, and jk into your 3 buffers. Looping over those you'd output aej and bfk. Go back and read the next chunks: cd, gh, and lm. This gives cgl and dhm as output.

Marek R · Answer

I would do this like this:

Open the source file.
Measure line size
Measure line count (file size / (line size + size of EOL)). Note EOL can be 2 bytes.
calculate result file size. Open result file and force it to have it desired size, so later you can seek to any part of the file.
peak some size of square which is memory manageable. For example 1024x1024
Now you should load square part of the matrix. 1024 elements for rows of 1024 constitutive rows.
Transpose square
Write it to destination file by seeking to proper column of each part of row you are writing. (you can reduce memory consumption in previous point by transposing one column and then write it as a row, instead transposing whole square at once)
Iterate square over whole file matrix

IMO you can't do it better. Most critical will be how to select size of the square. Big power of 2 is recommended.

Multi stream iterators c++

Tags:

c++

stream

istream-iterator

B. Hel

2 Answers

1201ProgramAlarm

Marek R

Recent Activity

Donate For Us

Multi stream iterators c++

Tags:

c++

stream

istream-iterator

B. Hel

2 Answers

1201ProgramAlarm

Marek R

Related questions

Recent Activity

Donate For Us