Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas ParserError: Error tokenizing data. C error: EOF inside string

I have data that is over 400,000 lines long. When running this code:

f=pd.read_csv(filename,error_bad_lines=False)

I get the following error:

pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 454751

My data by the end of the file looks like this:

BTC 9948    8718    1.57E+12    ASK
BTC 52      8718    1.57E+12    ASK
BTC 120     8718    1.57E+12    ASK
BTC 200     8718    1.57E+12    ASK
BTC 150     8718    1.57E+12    ASK
BTC 50      8718    1.57E+12    ASK
BTC 10      8718    1.57E+12    ASK
BTC 57      8718    1.57E+12    ASK
BTC 50      8718    1.57E+12    ASK
BTC 50191   8718    

Line 454751 is this one: BTC 50 8718 1.57E+12 ASK
I tried running error_bad_lines=False as seen above but that still doesnt work. I also searched for quotes in my file but I do not have any.

like image 894
AspiringCoder Avatar asked Feb 21 '20 20:02

AspiringCoder


1 Answers

Changing the Parser engine from C to Python should solve your problem. Use the following line to read your csv:

f=pd.read_csv(filename,error_bad_lines=False, engine="python")

From the read_csv documentation:

engine{‘c’, ‘python’}, optional Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.

like image 156
Rahul P Avatar answered Nov 14 '22 19:11

Rahul P