Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open an lzo file in python, without decompressing the file

Tags:

python

lzo

I'm currently working on a 3rd year project involving data from Twitter. The department have provided me with .lzo's of a months worth of Twitter. The smallest is 4.9gb and when decompressed is 29gb so I'm trying to open the file and read as I'm going. Is this possible or do I need to decompress and work with the data that way?

EDIT: Have attempted to read it line by line and decompress the read line

UPDATE: Found a solution - reading the STDOUT of lzop -dc works like a charm

like image 926
DrugCrazed Avatar asked Dec 12 '22 20:12

DrugCrazed


2 Answers

How about starting an lzop binary in a subprocess with -c switch and then read its STDOUT line by line?

like image 122
eumiro Avatar answered Mar 24 '23 09:03

eumiro


I know only one library for LZO with Python — https://github.com/jd-boyd/python-lzo and it requires full decompression (moreover — it decompress contents in memory).

So I think you'll need to decompress files before work with them.

like image 20
cleg Avatar answered Mar 24 '23 09:03

cleg