I am reading a gzipped file using boost iostreams: The following works fine:
namespace io = boost::iostreams;
io::filtering_istream in;
in.push(boost::iostreams::basic_gzip_decompressor<>());
in.push(io::file_source("test.gz"));
stringstream ss;
copy(in, ss);
However, I don't want to take the memory hit of reading an entire gzipped file into memory. I want to be able to read the file incrementally.
For example, if I have a data structure X that initializes itself from istream,
X x;
x.read(in);
fails. Presumably this is because we may have to put back characters into the stream if we are doing partial streams. Any ideas whether boost iostreams supports this?
I think you need to write your own filter. For instance, to read a .tar.gz and output the files contained, I wrote something like
//using namespace std;
namespace io = boost::iostreams;
struct tar_expander
{
tar_expander() : out(0), status(header)
{
}
~tar_expander()
{
delete out;
}
/* qualify filter */
typedef char char_type;
struct category :
io::input_filter_tag,
io::multichar_tag
{ };
template<typename Source>
void fetch_n(Source& src, std::streamsize n = block_size)
{
/* my utility */
....
}
// Read up to n filtered characters into the buffer s,
// returning the number of characters read or -1 for EOF.
// Use src to access the unfiltered character sequence
template<typename Source>
std::streamsize read(Source& src, char* s, std::streamsize n)
{
fetch_n(src);
const tar_header &h = cast_buf<tar_header>();
int r;
if (status == header)
{
...
}
std::ofstream *out;
size_t fsize, stored;
static const size_t block_size = 512;
std::vector<char> buf;
enum { header, store_file, archive_end } status;
}
}
My function read(Source &...)
when called receives the unzipped text.
To use the filter:
ifstream file("/home/..../resample-1.8.1.tar.gz", ios_base::in | ios_base::binary);
io::filtering_streambuf<io::input> in;
in.push(tar_expander());
in.push(io::gzip_decompressor());
in.push(file);
io::copy(in, cout);
According to the iostream documentation the type boost::io::filtering_istream
derives from std::istream
. That is, it should be possible to pass this everywhere an std::istream&
is expected. If you have errors at run-time because you need to unget()
or putback()
characters you should have a look at the pback_size
parameter which specifies how many characters are return at most. I haven't seen in the documentation what the default value for this parameter is.
If this doesn't solve your problem can you describe what your problem is exactly? From the looks of it should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With