Consider the following example code:
#include <iostream>
using namespace std;
int main()
{
istreambuf_iterator<char> eos;
istreambuf_iterator<char> iit(cin.rdbuf());
int i;
for (i = 0; iit != eos; ++i, ++iit) {
cout << *iit;
}
cout << endl << i << endl;
}
And an input file containing the following: "foo\xffbar":
$ hexdump testin
0000000 66 6f 6f ff 62 61 72
0000007
Now for the test using clang libc++ vs gnu libstdc++:
$ make test
clang++ -std=c++11 -stdlib=libc++ -Wall -stdlib=libc++ -o bug-libcc bug.cpp
clang++ -std=c++11 -stdlib=libc++ -Wall -stdlib=libstdc++ -o bug-libstd bug.cpp
./bug-libcc < testin
foo
3
./bug-libstd < testin
foo�bar
7
As you can see the libc++ version thinks the 0xff is the end of stream and it stops reading. So this leads to a couple of questions.
1) Is this a bug in libc++ that I should report? My google searches for existing bugs have turned up nothing.
2) Is there a good way to work around this issue?
EDIT
The following code works:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream ifs ("testin", ios::binary);
istreambuf_iterator<char> eos;
istreambuf_iterator<char> iit(ifs.rdbuf());
int i;
for (i = 0; iit != eos; ++i, ++iit) {
cout << *iit;
}
cout << endl << i << endl;
}
Leading me to believe that it is a binary conversion issue, but that doesn't explain why libstdc++ works properly.
EDIT2
Using a file without binary works fine too:
ifstream ifs ("testin");
So there is definitely something fishy going on. It looks like it might be an issue in the implementation of cin though, not the iterator.
Unfortunately there is still a bug in libc++ (in addition to the one ecatmur pointed out). Here is the fix:
Index: include/__std_stream
===================================================================
--- include/__std_stream (revision 176092)
+++ include/__std_stream (working copy)
@@ -150,7 +150,7 @@
{
for (int __i = __nread; __i > 0;)
{
- if (ungetc(__extbuf[--__i], __file_) == EOF)
+ if (ungetc(traits_type::to_int_type(__extbuf[--__i]), __file_) == EOF)
return traits_type::eof();
}
}
I will get this checked in asap. Sorry for the bug. Thanks for bringing it to my attention.
Fix Committed revision 176822 to the libcxx public svn trunk. The fix requires a re-compiled dylib even though the fix is in a header.
I think you might have found a bug that has already been fixed. This commit (by @Howard Hinnant) contains the following changes:
@@ -104,7 +104,7 @@
int __nread = _VSTD::max(1, __encoding_);
for (int __i = 0; __i < __nread; ++__i)
{
- char __c = getc(__file_);
+ int __c = getc(__file_);
if (__c == EOF)
return traits_type::eof();
__extbuf[__i] = static_cast<char>(__c);
@@ -131,7 +131,7 @@
if (__nread == sizeof(__extbuf))
return traits_type::eof();
{
- char __c = getc(__file_);
+ int __c = getc(__file_);
if (__c == EOF)
return traits_type::eof();
__extbuf[__nread] = static_cast<char>(__c);
You'll notice that the older version stored the return value of getc
into char
, which is a no-no for the precise reason that it confuses the char
value 0xff
with the int
value EOF
(i.e., -1
).
The bug applies only to cin
because the affected methods are on __stdinbuf
, which is the type libc++ uses to implement cin
only; ifstream
e.g. uses basic_filebuf<char>
.
Check the libcxx/include/__std_stream
file on your system to see whether it has this bug; if it does, apply the patch and it should fix it.
The iterator is extracting from the stream.
The stream needs to be opened with binary
mode to prevent any translations to the original data.
Next, don't use char
. The char
type can be signed, unsigned or not either, depending on the compiler. I recommend using uint8_t
when reading binary octets.
Try something like this:
#include <cstdint>
using std::uint8_t;
istreambuf_iterator<uint8_t> eos;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With