I have data that is over 400,000 lines long. When running this code:
f=pd.read_csv(filename,error_bad_lines=False)
I get the following error:
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 454751
My data by the end of the file looks like this:
BTC 9948 8718 1.57E+12 ASK
BTC 52 8718 1.57E+12 ASK
BTC 120 8718 1.57E+12 ASK
BTC 200 8718 1.57E+12 ASK
BTC 150 8718 1.57E+12 ASK
BTC 50 8718 1.57E+12 ASK
BTC 10 8718 1.57E+12 ASK
BTC 57 8718 1.57E+12 ASK
BTC 50 8718 1.57E+12 ASK
BTC 50191 8718
Line 454751 is this one: BTC 50 8718 1.57E+12 ASK
I tried running error_bad_lines=False
as seen above but that still doesnt work. I also searched for quotes in my file but I do not have any.
Changing the Parser engine from C to Python should solve your problem. Use the following line to read your csv:
f=pd.read_csv(filename,error_bad_lines=False, engine="python")
From the read_csv documentation:
engine{‘c’, ‘python’}, optional Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With