Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the beginning of my string disappearing?

In the following C++ code, I realised that gcount() was returning a larger number than I wanted, because getline() consumes the final newline character but doesn't send it to the input stream.

What I still don't understand is the program's output, though. For input "Test\n", why do I get " est\n"? How come my mistake affects the first character of the string rather than adding unwanted rubbish onto the end? And how come the program's output is at odds with the way the string looks in the debugger ("Test\n", as I'd expect)?

#include <fstream>
#include <vector>
#include <string>
#include <iostream>

using namespace std;

int main()
{
    const int bufferSize = 1024;
    ifstream input( "test.txt", ios::in | ios::binary );

    vector<char> vecBuffer( bufferSize );
    input.getline( &vecBuffer[0], bufferSize );
    string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() );
    cout << strResult << "\n";

    return 0;
}
like image 975
Tommy Herbert Avatar asked Jun 24 '09 16:06

Tommy Herbert


2 Answers

I've also duplicated this result, Windows Vista, Visual Studio 2005 SP2.

When I figure out what the heck is happening, I'll update this post.

edit: Okay, there we go. The problem (and the different results people are getting) are from the \r. What happens is you call input.getline and put the result in vecBuffer. The getline function strips off the \n, but leaves the \r in place.

You then transfer the vecBuffer to a string variable, but use the gcount function from input, meaning you will get one char too much, because the input variable still contains the \n, and the vecBuffer does not.

The resulting strResult is:

-       strResult   "Test"
        [0] 84 'T'  char
        [1] 101 'e' char
        [2] 115 's' char
        [3] 116 't' char
        [4] 13 '␍'  char
        [5] 0   char

So then "Test" is printed, followed by a carriage return (puts the cursor back at the start of the line), a null character (overwriting the T), and finally the \n, which correctly puts the cursor on the new line.

So you either have to strip out the \r, or write a function that gets the string length directly from vecBuffer, checking for null characters.

like image 64
Aistina Avatar answered Oct 28 '22 22:10

Aistina


I've duplicated Tommy's problem on a Windows XP Pro Service Pack 2 system with the code compiled using Visual Studio 2005 SP2 (actually, it says "Version 8.0.50727.879"), built as a console project.

If my test.txt file contains just "Test" and a CR, the program spits out " est" (note the leading space) when run.

If I had to take a wild guess, I'd say that this version of the implementation has a bug where it is treating the Windows newline character like it should be treated in Unix (as a "go to the front of the same line" character), and then it wipes out the first character to hold part of the next prompt or something.


Update: After playing with it a bit, I'm positive that is what is going on. If you look at strResult in the debugger, you will see that it copied over a decimal 13 value at the end. That's CR, which in Windows-land is '\n', and everywhere else is "return to the beginning of the line". If I instead change your constructor to read:

string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() - 1 );

...(so that the CR isn't copied) then it prints out "Test" like you'd expect.

like image 22
T.E.D. Avatar answered Oct 28 '22 22:10

T.E.D.