Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with getline and "strange characters"

Tags:

c++

wstring

I have a strange problem, I use

wifstream a("a.txt");
wstring line;
while (a.good()) //!a.eof()  not helping
{
     getline (a,line);
      //...
     wcout<<line<<endl;

}

and it works nicely for txt file like this http://www.speedyshare.com/files/29833132/a.txt (sorry for the link, but it is just 80 bytes so it shouldn't be a problem to get it , if i c/p on SO newlines get lost) BUT when I add for example 水 (from http://en.wikipedia.org/wiki/UTF-16/UCS-2#Examples )to any line that is the line where loading stops. I was under the wrong impression that getline that takes wstring as one input and wifstream as other can chew any txt input... Is there any way to read every single line in the file even if it contains funky characters?

like image 986
NoSenseEtAl Avatar asked Aug 12 '11 12:08

NoSenseEtAl


People also ask

Can you use Getline for characters?

The C++ getline() is a standard library function that is used to read a string or a line from an input stream. It is a part of the <string> header. The getline() function extracts characters from the input stream and appends it to the string object until the delimiting character is encountered.

Why CIN Getline is not working?

Problem with getline() after cin >> The getline() function does not ignore leading white space characters. So special care should be taken care of about using getline() after cin because cin ignores white space characters and leaves it in the stream as garbage.

Does Getline work with C strings?

The functions get and getline (with the three parameters) will read and store a c-style string. The parameters: First parameter (str) is the char array where the data will be stored. Note that this is an array passed into a function, so the function has access to modify the original array.


1 Answers

The not-very-satisfying answer is that you need to imbue the input stream with a locale which understands the particular character encoding in question. If you don't know which locale to choose, you can use the empty locale.

For example (untested):

std::wifstream a("a.txt");
std::locale loc("");
a.imbue(loc);

Unfortunately, there is no standard way to determine what locales are available for a given platform, let alone select one based on the character encoding.

The above code puts the locale selection in the hands of the user, and if they set it to something plausible (e.g. en_AU.UTF-8) it might all Just Work.

Failing this, you probably need to resort to third-party libraries such as iconv or ICU.

Also relevant this blog entry (apologies for the self-promotion).

like image 131
Alastair Avatar answered Sep 27 '22 22:09

Alastair