Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How exactly does the extract>> operator works in C++

I am a computer science student, an so do not have much experience with the C++ language (considering it is my first semester using this language,) or coding for that matter.

I was given an assignment to read integers from a text file in the simple form of:

19 3 -2 9 14 4
5 -9 -10 3
.
.
.

This sent me of on a journey to understand I/O operators better, since I am required to do certain things with this stream (duh.)

I was looking everywhere and could not find a simple explanation as to how does the extract>> operator works internally. Let me clarify my question:

I know that the extractor>> operator would extract one continues element until it hits space, tab, or newline. What I try to figure out is, where would the pointer(?) or read-location(?) be AFTER it extracts an element. Will it be on the last char of the element just removed or was it removed and therefore gone? will it be on the space/tab/'\n' character itself? Perhaps the beginning of the next element to extract?

I hope I was clear enough. I lack all the appropriate jargon to describe my problem clearer.


Here is why I need to know this: (in case anyone is wondering...) One of the requirements is to sum all integers in each line separately. I have created a loop to extract all integers one-by-one until it reaches the end of the file. However, I soon learned that the extract>> operator ignores space/tab/newline. What I want to try is to extract>> an element, and then use inputFile.get() to get the space/tab/newline. Then, if it's a newline, do what I gotta do. This will only work if the stream pointer will be in a good position to extract the space/tab/newline after the last extraction>>.


In my previous question, I tried to solve it using getline() and an sstring.


SOLUTION:

For the sake of answering my specific question, of how operator>> works, I had to accept Ben Voigt's answer as the best one. I have used the other solutions suggested here (using an sstring for each line) and they did work! (you can see it in my previous question's link) However, I implemented another solution using Ben's answer and it also worked:

        .
        .
        .

if(readFile.is_open()) {
        while (readFile >> newInput) {
                char isNewLine = readFile.get();    //get() the next char after extraction

                if(isNewLine == '\n')               //This is just a test!
                        cout << isNewLine;          //If it's a newline, feed a newline.
                else
                        cout << "X" << isNewLine;   //Else, show X & feed a space or tab

                lineSum += newInput;
                allSum += newInput;
                intCounter++;
                minInt = min(minInt, newInput);
                maxInt = max(maxInt, newInput);

                if(isNewLine == '\n') {
                        lineCounter++;
                        statFile << "The sum of line " << lineCounter
                        << " is: " << lineSum << endl;
                            lineSum = 0;
                }
        }
        .
        .
        .

With no regards to my numerical values, the form is correct! Both spaces and '\n's were catched: test

Thank you Ben Voigt :)

Nonetheless, this solution is very format dependent and is very fragile. If any of the lines has anything else before '\n' (like space or tab), the code will miss the newline char. Therefore, the other solution, using getline() and sstrings, is much more reliable.

like image 679
Gil Dekel Avatar asked Oct 03 '14 15:10

Gil Dekel


People also ask

Why is >> called the extraction operator?

“>>” is extraction operator because “it extract data enter by user from console or input screen to some storage location identified by variable”.

How does the extraction operator work?

The operator>> leaves the current position in the file one character beyond the last character extracted (which may be at end of file). Which doesn't necessarily help with your problem; there can be spaces or tabs after the last value in a line.

When reading numeric data from a text file What does the extraction operator >> do?

The extraction operator returns a zero value if it encounters a problem (typically, the end of the file). Therefore, it can be used as the test in an if statement or a while loop. Numbers, characters, and strings can be written to a file, standard output, or the standard error using the insertion operator <<.


4 Answers

After extraction, the stream pointer will be placed on the whitespace that caused extraction to terminate (or other illegal character, in which case the failbit will also be set).

This doesn't really matter though, since you aren't responsible for skipping over that whitespace. The next extraction will ignore whitespaces until it finds valid data.

In summary:

  • leading whitespace is ignored
  • trailing whitespace is left in the stream

There's also the noskipws modifier which can be used to change the default behavior.

like image 184
Ben Voigt Avatar answered Sep 30 '22 01:09

Ben Voigt


The operator>> leaves the current position in the file one character beyond the last character extracted (which may be at end of file). Which doesn't necessarily help with your problem; there can be spaces or tabs after the last value in a line. You could skip forward reading each character and checking whether it is a white space other than '\n', but a far more idiomatic way of reading line oriented input is to use std::getline to read the line, then initialize an std::istringstream to extract the integers from the line:

std::string line;
while ( std::getline( source, line ) ) {
    std::istringstream values( line );
    //  ...
}

This also ensures that in case of a format error in the line, the error state of the main input is unaffected, and you can continue with the next line.

like image 27
James Kanze Avatar answered Sep 30 '22 01:09

James Kanze


According to cppreference.com the standard operator>> delegates the work to std::num_get::get. This takes an input iterator. One of the properties of an input iterator is that you can dereference it multiple times without advancing it. Thus when a non-numeric character is detected, the iterator will be left pointing to that character.

like image 24
Mark Ransom Avatar answered Sep 30 '22 03:09

Mark Ransom


In general, the behavior of an istream is not set in stone. There exist multiple flags to change how any istream behaves, which you can read about here. In general, you should not really care where the internal pointer is; that's why you are using a stream in the first place. Otherwise you'd just dump the whole file into a string or equivalent and manually inspect it.

Anyway, going back to your problem, a possible approach is to use the getline method provided by istream to extract a string. From the string, you can either manually read it, or convert it into a stringstream and extract tokens from there.

Example:

std::ifstream ifs("myFile");
std::string str;

while ( std::getline(ifs, str) ) {
    std::stringstream ss( str );
    double sum = 0.0, value;
    while ( ss >> value ) sum += value;
    // Process sum
}
like image 34
Svalorzen Avatar answered Sep 30 '22 02:09

Svalorzen