Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Pandas DataFrame Read Skipping line XXX: expected X fields, saw Y

Tags:

python

pandas

csv

I can't figure out what's wrong with the csv file I'm trying to load:

I get error messages such as this: b'Skipping line 2120260: expected 6 fields, saw 8\n'

But when I view the lines, they look ok to me. See below -- (I am going to press enter after each tab \t to make it easier to read).

Line 2,120,260 (failing): ['user_000104\t 2005-09-12T06:25:50Z\t a019a8cf-2601-4a81-b3c3-7b279a873713\t Anne Clark\t 8f8e6bc0-c3c0-4062-875a-773a1de6206f\t Empty Me']

Line 9,000 (not failing): ['user_000001\t 2008-06-15T17:28:31Z\t a3031680-c359-458f-a641-70ccbaec6a74\t Steve Reich\t 2991db42-3b19-4344-a340-605ac3fbd7e9\t Drumming: Part Iv']

If anyone wants to try it out for themselves, download this:

http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html

and run: inpFile2 = pd.read_csv(fPath, sep='\t', error_bad_lines= False)

to generate the error. And: def checkRow(path,N): with open(path, 'r') as f: print("This is the line.") print(next(itertools.islice(csv.reader(f), N, None)))

to view the error row (pass in the file path and the row you are interested in). Make sure you import csv and import itertools.

like image

919

asked May 10 '17 11:05

user1761806

2 Answers

Ok I manged to get the bottom of it.

The solution is to use quoting=csv.QUOTE_NONE as a parameter in the read_csv command. inpFile = pd.read_csv(fPath, sep='\t', error_bad_lines= False,quoting=csv.QUOTE_NONE)

And the reason for that is the existence of a double quote in one of the fields which is causing Pandas go get confused so need to tell it not to look out for strings/quotes. Making the above change seems to have loaded it.

like image

53

answered Sep 18 '22 16:09

user1761806

In case you simply want to "hide" the warnings for row errors, you can use parameter warn_bad_lines=False , as opposed to default value True, more info here: pandas.pydata.org/pandas-docs

like image

41

answered Sep 17 '22 16:09

Lorenzo Bassetti

Sign in to Comment

Related questions
                            
                                Dot Product in Python without NumPy
                            
                                pandas pivot table - changing order of non-index columns
                            
                                What is the equivalent of from django.views.generic.simple import direct_to_template in django 1.9
                            
                                How to remove the Windows PATH from a Sublime Text 3 Python build error?
                            
                                How to get IAM Policy Document via boto
                            
                                While debugging, how to print all variables (which is in list format) who are trainable in Tensorflow?
                            
                                Any way to access methods from individual stages in PySpark PipelineModel?
                            
                                Missing dll files when using pyinstaller
                            
                                Python: How to catch inner exception of exception chain?
                            
                                how to find the complement of two dataframes
                            
                                Vocabulary Processor function
                            
                                I have a RSA public key exponent and modulus. How can I encrypt a string using Python?
                            
                                Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it
                            
                                Pivot table subtotals in Pandas
                            
                                I get 'continuation line under-indented for visual indent' error
                            
                                ImportError: No module named _ctypes. Google app engine with bokeh plot
                            
                                Creating pandas dataframe from a list of strings
                            
                                When I do pip --version it show the error as ImportError: No module named pyparsing
                            
                                Creating/Uploading new file at Google Cloud Storage bucket using Python
                            
                                Python - Trying to create a dictionary through a for loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With