Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas delimiter misprint - double sign

This is my code to open file:

df = pd.read_csv(path_df, delimiter='|')

I get error: Error tokenizing data. C error: Expected 5 fields in line 13571, saw 6

When I check this particular line, I see that there was a misprint and there were 3 signs "|||" instead of one. I would prefer treat double and triple signs as one. Probably, there is other solution.

How can I solve this problem?

like image 568
Pinky the mouse Avatar asked Nov 20 '25 10:11

Pinky the mouse


1 Answers

Use regex separator [|]+ - one or more |:

import pandas as pd

temp=u"""a|b|c
ss|||s|s
t|g|e"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="[|]+",engine='python')

print (df)
    a  b  c
0  ss  s  s
1   t  g  e
like image 194
jezrael Avatar answered Nov 22 '25 00:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!