Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I fix "Error tokenizing data" on pandas csv reader?

I'm trying to read a csv file with pandas.

This file actually has only one row but it causes an error whenever I try to read it.

Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.

I do like:

with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:

df = pd.read_csv(file, header=None, sep="\t")
df

Then I get:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3

I don't get what's really going on, so any of your advice will be appreciated.

like image 595
user9191983 Avatar asked Nov 12 '18 04:11

user9191983


People also ask

What does error Tokenizing data mean?

While reading a CSV file, you may get the “Pandas Error Tokenizing Data“. This mostly occurs due to the incorrect data in the CSV file. You can solve python pandas error tokenizing data error by ignoring the offending lines using error_bad_lines=False .

What is error Tokenizing data in Python?

errors. ParserError: Error tokenizing data is raised by the pandas parser when reading csv files into pandas DataFrames. Additionally, we showcased how to deal with the error by fixing the errors or typos in the data file itself, or by specifying the appropriate line terminator.

What is parse error in pandas?

ParserError[source] Exception that is raised by an error encountered in parsing file contents. This is a generic error raised for errors encountered when functions like read_csv or read_html are parsing contents of a file. See also read_csv. Read CSV (comma-separated) file into a DataFrame.

How to solve the pandas tokenizing error?

If You’re in Hurry… You can use the below code snippet to solve the tokenizing error. You can solve the error by ignoring the offending lines and suppressing errors. import pandas as pd df = pd.read_csv ('sample.csv', error_bad_lines=False, engine ='python') df

How to avoid tokenizing error when reading CSV file?

When there is insufficient data in any of the rows, the tokenizing error will occur. You can skip such invalid rows by using the err_bad_line parameter within the read_csv () method. This parameter controls what needs to be done when a bad line occurs in the file being read. Use the below snippet to read the CSV file and ignore the invalid lines.

What is pandas errors parsererror?

In today’s short guide, we discussed a few cases where pandas.errors.ParserError: Error tokenizing data is raised by the pandas parser when reading csv files into pandas DataFrames. Additionally, we showcased how to deal with the error by fixing the errors or typos in the data file itself, or by specifying the appropriate line terminator.

How to solve bad lines in pandas data?

You can solve the error by ignoring the offending lines and suppressing errors. import pandas as pd df = pd.read_csv ('sample.csv', error_bad_lines=False, engine ='python') df If You Want to Understand Details, Read on…


2 Answers

I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.

df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')
like image 118
Hietsh Kumar Avatar answered Sep 21 '22 21:09

Hietsh Kumar


Try df = pd.read_csv(file, header=None, error_bad_lines=False)

like image 27
Po Xin Avatar answered Sep 23 '22 21:09

Po Xin