I have data that is over 400,000 lines long. When running this code:
f=pd.read_csv(filename,error_bad_lines=False)
I get the following error:
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 454751
My data by the end of the file looks like this:
BTC 9948    8718    1.57E+12    ASK
BTC 52      8718    1.57E+12    ASK
BTC 120     8718    1.57E+12    ASK
BTC 200     8718    1.57E+12    ASK
BTC 150     8718    1.57E+12    ASK
BTC 50      8718    1.57E+12    ASK
BTC 10      8718    1.57E+12    ASK
BTC 57      8718    1.57E+12    ASK
BTC 50      8718    1.57E+12    ASK
BTC 50191   8718    
Line 454751 is this one: BTC  50      8718    1.57E+12    ASK
I tried running error_bad_lines=False as seen above but that still  doesnt work. I also searched for quotes in my file but I do not have any.
Changing the Parser engine from C to Python should solve your problem. Use the following line to read your csv:
f=pd.read_csv(filename,error_bad_lines=False, engine="python")
From the read_csv documentation:
engine{‘c’, ‘python’}, optional Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With