How can I read tar.gz file using pandas read_csv with gzip compression option?

Question

I have a very simple csv, with the following data, compressed inside the tar.gz file. I need to read that in dataframe using pandas.read_csv.

   A  B 0  1  4 1  2  5 2  3  6  import pandas as pd pd.read_csv("sample.tar.gz",compression='gzip')

However, I am getting error:

CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2

Following are the set of read_csv commands and the different errors I get with them:

pd.read_csv("sample.tar.gz",compression='gzip',  engine='python') Error: line contains NULL byte  pd.read_csv("sample.tar.gz",compression='gzip', header=0) CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2  pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ") CParserError: Error tokenizing data. C error: Expected 2 fields in line 94, saw 14      pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ", engine='python') Error: line contains NULL byte

What's going wrong here? How can I fix this?

Marlon Abeykoon · Accepted Answer

df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)

Note: error_bad_lines=False will ignore the offending rows.

teichert · Answer

You can use the tarfile module to read a particular file from the tar.gz archive (as discussed in this resolved issue). If there is only one file in the archive, then you can do this:

import tarfile import pandas as pd with tarfile.open("sample.tar.gz", "r:*") as tar:     csv_path = tar.getnames()[0]     df = pd.read_csv(tar.extractfile(csv_path), header=0, sep=" ")

The read mode r:* handles the gz extension (or other kinds of compression) appropriately. If there are multiple files in the zipped tar file, then you could do something like csv_path = list(n for n in tar.getnames() if n.endswith('.csv'))[-1] line to get the last csv file in the archived folder.

How can I read tar.gz file using pandas read_csv with gzip compression option?

Tags:

python

pandas

csv

gzip

tar

Geet

2 Answers

Marlon Abeykoon

teichert

Recent Activity

Donate For Us

How can I read tar.gz file using pandas read_csv with gzip compression option?

Tags:

python

pandas

csv

gzip

tar

Geet

2 Answers

Marlon Abeykoon

teichert

Related questions

Recent Activity

Donate For Us