Reading first lines of bz2 files in python

Tags:

python

bz2

I am trying to extract 10'000 first lines from a bz2 file.

   import bz2       
   file = "file.bz2"
   file_10000 = "file.txt"

   output_file = codecs.open(file_10000,'w+','utf-8')

   source_file = bz2.open(file, "r")
   count = 0
   for line in source_file:
       count += 1
       if count < 10000:
           output_file.writerow(line)

But I get an error "'module' object has no attribute 'open'". Do you have any ideas? Or may be I could save 10'000 first lines to a txt file in some other way? I am on Windows.

522

asked May 11 '16 20:05

student

1 Answers

Here is a fully working example that includes writing and reading a test file that is much smaller than your 10000 lines. Its nice to have working examples in questions so we can test easily.

import bz2
import itertools
import codecs

file = "file.bz2"
file_10000 = "file.txt"

# write test file with 9 lines
with bz2.BZ2File(file, "w") as fp:
    fp.write('\n'.join('123456789'))

# the original script using BZ2File ... and 3 lines for test
# ...and fixing bugs:
#     1) it only writes 9999 instead of 10000
#     2) files don't do writerow
#     3) close the files

output_file = codecs.open(file_10000,'w+','utf-8')

source_file = bz2.BZ2File(file, "r")
count = 0
for line in source_file:
    count += 1
    if count <= 3:
       output_file.write(line)
source_file.close()
output_file.close()

# show what you got
print('---- Test 1 ----')
print(repr(open(file_10000).read()))

A more efficient way to do it is to break out of the for loop after reading the lines you want. you can even leverage iterators to thin up the code like so:

# a faster way to read first 3 lines
with bz2.BZ2File(file) as source_file,\
        codecs.open(file_10000,'w+','utf-8') as output_file:
    output_file.writelines(itertools.islice(source_file, 3))

# show what you got
print('---- Test 2 ----')
print(repr(open(file_10000).read()))

answered Oct 04 '22 05:10

tdelaney

Related questions
                            
                                sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs"
                            
                                Why i can't do some things without sudo using Python and pip?
                            
                                Python: How to reset the turtle graphics window
                            
                                Is the python "elif" compiled differently from else: if?
                            
                                python np.c_ error"CClass object is not callabel"
                            
                                Pyro4: Failed to locate the nameserver
                            
                                Python argparse --toggle --no-toggle flag
                            
                                Python: No module named ... How to use pip
                            
                                In Python, is it possible to access the global namespace from within a function
                            
                                Import pandas on jupyter ipython notebook fails
                            
                                ImportError: No module named numpy.distutils.core (Ubuntu xgboost installation)
                            
                                QComboBox click event
                            
                                Add a white background to colorbar in matplotlib
                            
                                how to make a new numpy array same size as a given array and fill it with a scalar value
                            
                                How to convert 2d numpy array into binary indicator matrix for max value
                            
                                How to create a random array in a certain range
                            
                                How to get all mails from MS exchange in Python?
                            
                                Spherical coordinates plot in matplotlib
                            
                                Closures, Partials and Decorators
                            
                                aws cli in cygwin - how to clean up differences in windows and cygwin style paths

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With