Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read tab-delimited fields with pandas when some lines have more than one tab

Tags:

pandas

I am trying to read a tab-separated text file using pandas. The file looks like this:

data file sample

14.38   14.21   0.8951  5.386   3.312   2.462   4.956   1
14.69   14.49   0.8799  5.563   3.259   3.586   5.219   1
14.11   14.12   0.8911  5.422   3.302   2.723           5       1

Some lines have extra tabs. If I use read_csv or read_fwf and specify sep='\t'. I get results that look like this:

0   15.26\t14.84\t0.871\t5.763\t3.312\t2.221\t5.22\t1
1   14.88\t14.57\t0.8811\t5.554\t3.333\t1.018\t4.9

Do you have any suggestions as to what parameters I could specify to deal with this problem?

Solution:

use pd.read_csv(filename, delim_whitespace=True)

like image 676
Ying G. Avatar asked Nov 19 '25 08:11

Ying G.


1 Answers

Pandas read_csv is very versatile, you can use it with delim_whitespace = True to handle variable number of whitespaces.

df = pd.read_csv(filename, delim_whitespace=True)

Option 2: Use separator argument

df = pd.read_csv(filename, sep='\t+')
like image 98
Vaishali Avatar answered Nov 21 '25 09:11

Vaishali



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!