I'm trying to read a dataset using pd.read_csv() am getting an error. Excel can open it just fine.
reviews = pd.read_csv('br.csv')
gives the error ParserError: Error tokenizing data. C error: EOF inside string starting at line 312074
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')
returns ParserError: unexpected end of data
What can I do to fix this?
Edit: This is the dataset - https://www.kaggle.com/gnanesh/goodreads-book-reviews
If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output. (Only valid with C parser).
parse_dates : boolean or list of ints or names or list of lists or dict, default False. boolean. If True -> try parsing the index. list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
For me adding this fixed it:
error_bad_lines=False
It just skips the last line. So instead of
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')
reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)
In my case, I don't want to skip lines, since my task is required to count the number of data records in the csv file. The solution that works for me is using the Quote_None from csv library. I try this from reading on some websites that I did not remember, but it works.
To describe my case, previouly I have the error: EOF .... Then I tried using the parameter engine='python'. But that introduce another bug for next step of using the dataframe. Then I try quoting=csv.Quote_None, and it's ok now. I hope this helps
import csv
read_file = read_csv(full_path, delimiter='~', encoding='utf-16 BE', header=0, quoting=csv.QUOTE_NONE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With