Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Organizing files in tar bz2 file with python

I have about 200,000 text files that are placed in a bz2 file. The issue I have is that when I scan the bz2 file to extract the data I need, it goes extremely slow. It has to look through the entire bz2 file to fine the single file I am looking for. Is there anyway to speed this up?

Also, I thought about possibly organizing the files in the tar.bz2 so I can instead have it know where to look. Is there anyway to organize files that are put into a bz2?

More Info/Edit: I need to query the compressed file for each textfile. Is there a better compression method that supports such a large number of files and is as thoroughly compressed?

like image 405
xZel Avatar asked Mar 01 '26 15:03

xZel


1 Answers

Do you have to use bzip2? Reading it's documentation, it's quite clear it's not designed to support random access. Perhaps you should use a compression format that more closely matches your requirements. The good old Zip format supports random access, but might compress worse, of course.

like image 185
unwind Avatar answered Mar 04 '26 04:03

unwind



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!