Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shift-JIS decoding fails using wifstrem in Visual C++ 2013

I am trying to read a text file encoded in Shift-JIS (cp 932) using std::wifstream, and std::getline. The following code works in VS2010 but fails in VS2013:

std::wifstream in;
in.open("data932.txt");

const std::locale locale(".932");

in.imbue(locale);

std::wstring line1, line2;
std::getline(in, line1);
std::getline(in, line2);
const bool good = in.good();

The file contains several lines, where the first line contains just ASCII characters, and the second is Japanese script. Thus, when this snippet runs, line1 should contain the ASCII line, line2 the Japanese script, and good should be true.

When compiled in VS2010, the result is as expected. But when compiled in VS2013, line1 contains the ASCII line, but line2 is empty, and good is false.

I debugged into the CRT, (as the source is provided with Visual Studio), and found that an internal function called _Mbrtowc (in file xmbtowc.c) was modified between the two versions, and the way they use to detect a lead byte of a double byte character was changed, and the one in VS 2013 fails to detect a lead byte, thus fails to decode the byte stream.

Further debugging revealed a point, where a _Cvtvec object's _Isleadbyte array is initialized (in the function _Getcvt(), in file xwctomb.c), and that initialization produces a wrong result. It seems that it always uses code page 1252, which is the default code page on my system, and not 932 which is set for the stream in use. However, I could not decide if it is by design, and I missed some required steps to get a good result, or this is indeed a bug in the CRT for VS2013.

Unfortunately I don't have VS2012 installed, so I could not test on that version.

Any insights on this topic are welcome!

like image 515
Peter B Avatar asked Nov 01 '22 15:11

Peter B


1 Answers

I have found a workaround: if for the creation of the locale I explicitly change the global MBC code page, the locale is initialized correctly, and the lines are read and decoded as expected.

const int oldMbcp = _getmbcp();
_setmbcp(932);
const std::locale locale("Japanese_Japan.932");
_setmbcp(oldMbcp);
like image 112
Peter B Avatar answered Nov 15 '22 07:11

Peter B