I'm trying to read text files into Pandas dataframes from inside a zipped archive. The files are formatted like this:
System Time hh:mm:ss PPS Zsec(sec) Hex Message
Yr=17 Mn= 3 Dy= 3
19:22:59.894 19:22:16 52 69736 7E 32 02 4F 02 00 0C 7F 97 68 10 01 00 11 03 03 13 16 10 34 00 00 00 05 02 00 80 00 83 B1 7E
19:24:12.130 19:23:10 106 69790 7E 32 02 4F 02 00 0C 7F 97 9E 10 01 00 11 03 03 13 17 0A 6A 00 00 00 05 12 00 BA 00 47 DF 7E
19:24:13.241 19:23:11 107 69791 7E 32 02 4F 02 00 0C 7F 97 9F 10 01 00 11 03 03 13 17 0B 6B 00 00 00 05 05 00 BC 00 F3 AC 7E
If the file is extracted outside the archive, I can read it:
data = '../data/test1/heartbeat.txt'
df = pd.read_csv(data, sep='\s{2,}', engine='python', skiprows=4, encoding='utf8',
names=['System Time','hh:mm:ss','PPS','Zsec(sec)', 'Hex Message'])
But that approach fails if I try to access it inside the zipfile:
zf = zipfile.ZipFile('../data.zip', 'r')
data = zf.open('data/test1/heartbeat.txt')
df = pd.read_csv(data, sep='\s{2,}', engine='python', skiprows=4, encoding='utf8',
names=['System Time','hh:mm:ss','PPS','Zsec(sec)', 'Hex Message'])
I see TypeError: cannot use a string pattern on a bytes-like object
If I use delim_whitespace
instead of \s{2,}
it reads the file. So it seems like I'm using zipfile successfully. However, the 'Hex Message' column contains single spaces, which get broken into many columns in the dataframe.
I've also tried using fixed-width column reading, read_fwf
, which also works with the extracted file:
data = '../data/test1/heartbeat.txt'
widths = [13,14,10,13,100]
df = pd.read_fwf(data,widths=widths,skiprows=4,
names = ['System Time', 'hh:mm:ss', 'PPS', 'Zsec(sec)','Hex Message'])
But that also fails when the file is inside the zip archive: TypeError: a bytes-like object is required, not 'str'
I'm not sure how translate these bytes-like objects from the zipfile into something the Pandas reader can parse.
To efficiently parse fixed width files with Python, we can use the Pandas' read_fwf method. to define the col_specification list with the column specifications for filename. txt. Then we call read.
Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.
We can read data from a text file using read_table() in pandas. This function reads a general delimited file to a DataFrame object. This function is essentially the same as the read_csv() function but with the delimiter = '\t', instead of a comma by default.
Reading fixed width text files with Pandas is easy and accessible. The default parameters for pandas.read_fwf () work in most cases and the customization options are well documented. The Pandas library has many functions to read a variety of file types and the pandas.read_fwf () is one more useful Pandas tool to keep in mind.
Pandas library has a built-in read_csv () method to read a CSV that is a comma-separated value text file so we can use it to read a text file to Dataframe. It read the file at the given path and read its contents in the dataframe.
One can read a text file (txt) by using the pandas read_fwf () function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv () function.
It can be installed using the below command: Method #1: Using compression=zip in pandas.read_csv () method. By assigning the compression argument in read_csv () method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file. Method #2: Opening the zip file to get the CSV file.
This is working for me:
zf = zipfile.ZipFile('../data.zip', 'r')
data = io.StringIO(zf.read('data/test1/heartbeat.txt').decode('utf_8'))
df = pd.read_csv(data, sep='\s{2,}', engine='python', skiprows=4, encoding='utf8',
names=['System Time','hh:mm:ss','PPS','Zsec(sec)', 'Hex Message'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With