Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I read line-by-line using Boost IOStreams' interface for Gzip files?

I managed to integrate the boost Iostream APIs for reading zipped files. I followed the documentation in boost page and have the following code so-far:

std::stringstream outStr;  
ifstream file("file.gz", ios_base::in | ios_base::binary);  
try {  
    boost::iostreams::filtering_istreambuf in;  
    in.push(boost::iostreams::gzip_decompressor());  
    in.push(file);  
    boost::iostreams::copy(in, outStr);  
}  
catch(const boost::iostreams::gzip_error& exception) {  
    int error = exception.error();  
    if (error == boost::iostreams::gzip::zlib_error) {  
       //check for all error code    
    }   
}  

The code works fine (so please ignore any typos. and errors above :)).

  1. Looks like the above code will read the complete the file and store it in the memory while creating the filtering_istreambuf. Is that true, from my investigation it looks so to me? If the file is read into memory, this code can be an issue for large files (which is what I'm dealing with).
  2. My current code reads the gzipped using gzgets API from zlib line by line. Is there a way to do line by line reading using boost APIs?
like image 676
cppcoder Avatar asked Jun 21 '11 05:06

cppcoder


1 Answers

1) Yes, the above code will copy() the entire file into the string buffer outStr. According to the description of copy

The function template copy reads data from a given model of Source and writes it to a given model of Sink until the end of stream is reached.

2) switch from filtering_istreambuf to filtering_istream and std::getline() will work:

#include <iostream>
#include <fstream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
int main()
{
    std::ifstream file("file.gz", std::ios_base::in | std::ios_base::binary);
    try {
        boost::iostreams::filtering_istream in;
        in.push(boost::iostreams::gzip_decompressor());
        in.push(file);
        for(std::string str; std::getline(in, str); )
        {
            std::cout << "Processed line " << str << '\n';
        }
    }
    catch(const boost::iostreams::gzip_error& e) {
         std::cout << e.what() << '\n';
    }
}

(you can std::cout << file.tellg() << '\n'; inside that loop if you want proof. It will increase in sizeable chunks, but it won't be equal the length of the file from the start)

like image 93
Cubbi Avatar answered Sep 19 '22 14:09

Cubbi