Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i read a 0xFF in a file with libc++ istream_iterator?

Consider the following example code:

#include <iostream>

using namespace std;

int main()
{
  istreambuf_iterator<char> eos;
  istreambuf_iterator<char> iit(cin.rdbuf());
  int i;
  for (i = 0; iit != eos; ++i, ++iit) {
    cout << *iit;
  }
  cout << endl << i << endl;
}

And an input file containing the following: "foo\xffbar":

$ hexdump testin
0000000 66 6f 6f ff 62 61 72
0000007

Now for the test using clang libc++ vs gnu libstdc++:

$ make test
clang++ -std=c++11 -stdlib=libc++ -Wall -stdlib=libc++ -o bug-libcc bug.cpp
clang++ -std=c++11 -stdlib=libc++ -Wall -stdlib=libstdc++ -o bug-libstd bug.cpp
./bug-libcc < testin
foo
3
./bug-libstd < testin
foo�bar
7

As you can see the libc++ version thinks the 0xff is the end of stream and it stops reading. So this leads to a couple of questions.

1) Is this a bug in libc++ that I should report? My google searches for existing bugs have turned up nothing.

2) Is there a good way to work around this issue?

EDIT

The following code works:

#include <iostream>
#include <fstream>

using namespace std;

int main()
{
  ifstream ifs ("testin", ios::binary);
  istreambuf_iterator<char> eos;
  istreambuf_iterator<char> iit(ifs.rdbuf());
  int i;
  for (i = 0; iit != eos; ++i, ++iit) {
    cout << *iit;
  }
  cout << endl << i << endl;
}

Leading me to believe that it is a binary conversion issue, but that doesn't explain why libstdc++ works properly.

EDIT2

Using a file without binary works fine too:

ifstream ifs ("testin");

So there is definitely something fishy going on. It looks like it might be an issue in the implementation of cin though, not the iterator.

like image 924
vishvananda Avatar asked Mar 11 '13 17:03

vishvananda


3 Answers

Unfortunately there is still a bug in libc++ (in addition to the one ecatmur pointed out). Here is the fix:

Index: include/__std_stream
===================================================================
--- include/__std_stream    (revision 176092)
+++ include/__std_stream    (working copy)
@@ -150,7 +150,7 @@
     {
         for (int __i = __nread; __i > 0;)
         {
-            if (ungetc(__extbuf[--__i], __file_) == EOF)
+            if (ungetc(traits_type::to_int_type(__extbuf[--__i]), __file_) == EOF)
                 return traits_type::eof();
         }
     }

I will get this checked in asap. Sorry for the bug. Thanks for bringing it to my attention.

Fix Committed revision 176822 to the libcxx public svn trunk. The fix requires a re-compiled dylib even though the fix is in a header.

like image 169
Howard Hinnant Avatar answered Oct 14 '22 06:10

Howard Hinnant


I think you might have found a bug that has already been fixed. This commit (by @Howard Hinnant) contains the following changes:

@@ -104,7 +104,7 @@
     int __nread = _VSTD::max(1, __encoding_);
     for (int __i = 0; __i < __nread; ++__i)
     {
-        char __c = getc(__file_);
+        int __c = getc(__file_);
         if (__c == EOF)
             return traits_type::eof();
         __extbuf[__i] = static_cast<char>(__c);
@@ -131,7 +131,7 @@
                 if (__nread == sizeof(__extbuf))
                     return traits_type::eof();
                 {
-                    char __c = getc(__file_);
+                    int __c = getc(__file_);
                     if (__c == EOF)
                         return traits_type::eof();
                     __extbuf[__nread] = static_cast<char>(__c);

You'll notice that the older version stored the return value of getc into char, which is a no-no for the precise reason that it confuses the char value 0xff with the int value EOF (i.e., -1).

The bug applies only to cin because the affected methods are on __stdinbuf, which is the type libc++ uses to implement cin only; ifstream e.g. uses basic_filebuf<char>.

Check the libcxx/include/__std_stream file on your system to see whether it has this bug; if it does, apply the patch and it should fix it.

like image 27
ecatmur Avatar answered Oct 14 '22 06:10

ecatmur


The iterator is extracting from the stream.
The stream needs to be opened with binary mode to prevent any translations to the original data.

Next, don't use char. The char type can be signed, unsigned or not either, depending on the compiler. I recommend using uint8_t when reading binary octets.

Try something like this:

#include <cstdint>
using std::uint8_t;
istreambuf_iterator<uint8_t> eos;
like image 1
Thomas Matthews Avatar answered Oct 14 '22 07:10

Thomas Matthews