I'm currently working on a 3rd year project involving data from Twitter. The department have provided me with .lzo's of a months worth of Twitter. The smallest is 4.9gb and when decompressed is 29gb so I'm trying to open the file and read as I'm going. Is this possible or do I need to decompress and work with the data that way?
EDIT: Have attempted to read it line by line and decompress the read line
UPDATE: Found a solution - reading the STDOUT of lzop -dc works like a charm
How about starting an lzop
binary in a subprocess with -c
switch and then read its STDOUT line by line?
I know only one library for LZO with Python — https://github.com/jd-boyd/python-lzo and it requires full decompression (moreover — it decompress contents in memory).
So I think you'll need to decompress files before work with them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With