Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can boost iostreams read and compress gzipped files on the fly?

I am reading a gzipped file using boost iostreams: The following works fine:

 namespace io = boost::iostreams;
  io::filtering_istream in;
  in.push(boost::iostreams::basic_gzip_decompressor<>());
  in.push(io::file_source("test.gz"));
  stringstream ss;
  copy(in, ss);

However, I don't want to take the memory hit of reading an entire gzipped file into memory. I want to be able to read the file incrementally.

For example, if I have a data structure X that initializes itself from istream,

X x;
x.read(in);

fails. Presumably this is because we may have to put back characters into the stream if we are doing partial streams. Any ideas whether boost iostreams supports this?

like image 823
ATemp Avatar asked Feb 28 '12 19:02

ATemp


2 Answers

I think you need to write your own filter. For instance, to read a .tar.gz and output the files contained, I wrote something like

//using namespace std;
namespace io = boost::iostreams;

struct tar_expander
{
    tar_expander() : out(0), status(header)
    {
    }
    ~tar_expander()
    {
        delete out;
    }

    /* qualify filter */
    typedef char char_type;
    struct category :
        io::input_filter_tag,
        io::multichar_tag
    { };

    template<typename Source>
    void fetch_n(Source& src, std::streamsize n = block_size)
    {
           /* my utility */
           ....
    }

    // Read up to n filtered characters into the buffer s,
    // returning the number of characters read or -1 for EOF.
    // Use src to access the unfiltered character sequence
    template<typename Source>
    std::streamsize read(Source& src, char* s, std::streamsize n)
    {
      fetch_n(src);
      const tar_header &h = cast_buf<tar_header>();
      int r;

      if (status == header)
      {
          ...
      }
      std::ofstream *out;
      size_t fsize, stored;

      static const size_t block_size = 512;
      std::vector<char> buf;

      enum { header, store_file, archive_end } status;
   }
}

My function read(Source &...) when called receives the unzipped text. To use the filter:

ifstream file("/home/..../resample-1.8.1.tar.gz", ios_base::in | ios_base::binary);
io::filtering_streambuf<io::input> in;
in.push(tar_expander());
in.push(io::gzip_decompressor());
in.push(file);
io::copy(in, cout);
like image 42
CapelliC Avatar answered Oct 13 '22 12:10

CapelliC


According to the iostream documentation the type boost::io::filtering_istream derives from std::istream. That is, it should be possible to pass this everywhere an std::istream& is expected. If you have errors at run-time because you need to unget() or putback() characters you should have a look at the pback_size parameter which specifies how many characters are return at most. I haven't seen in the documentation what the default value for this parameter is.

If this doesn't solve your problem can you describe what your problem is exactly? From the looks of it should work.

like image 116
Dietmar Kühl Avatar answered Oct 13 '22 12:10

Dietmar Kühl