How do I perform a streaming character conversion? [closed]

Question

I have data stored on disk in files that are far too big to store in main memory.

I want to stream this data from the disk into a data processing pipeline via iconv, like this:

zcat myfile | iconv -f L1 -t UTF-8 | # rest of the pipeline goes here

Unfortunately, I'm seeing iconv buffer the entire file in memory until it's exhausted before outputting any data. This means that I'm using up all of my main memory on a blocking operation in a pipeline whose memory footprint is otherwise minimal.

I've tried calling iconv like this:

stdbuf -o 0 iconv -f L1 -t UTF-8

But it looks like iconv is managing the buffering internally itself - it's nothing to do with the Linux pipe buffer.

I'm seeing this with the binary that's packaged with gblic 2.6 and 2.7 in Arch Linux, and I've deplicated it with glibc 2.5 in Debian.

Is there some way around this? I know that streaming character conversions are not simple, but I'd have thought that such a commonly used unix tool would work in streams; it's not at all rare to work with files that won't fit in main memory. Would I have to roll my own binary linked to libiconv?

jim mcnamara · Accepted Answer

Consider the iconv(3) call with iconv_open -- hook a simple C routine to those two calls. Read from stdin, write to stdout. Have a read of this example:

http://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html

This example is explictly meant to handle what you are describing. - avoid "stateful" waits for data.

How do I perform a streaming character conversion? [closed]

Tags:

linux

encoding

glibc

iconv

Cera

1 Answers

jim mcnamara

Recent Activity

Donate For Us

How do I perform a streaming character conversion? [closed]

Tags:

linux

encoding

glibc

iconv

Cera

1 Answers

jim mcnamara

Related questions

Recent Activity

Donate For Us