Pandas read_csv expects wrong number of columns, with ragged csv file

Tags:

I have a csv file that has a few hundred rows and 26 columns, but the last few columns only have a value in a few rows and they are towards the middle or end of the file. When I try to read it in using read_csv() I get the following error. "ValueError: Expecting 23 columns, got 26 in row 64"

I can't see where to explicitly state the number of columns in the file, or how it determines how many columns it thinks the file should have. The dump is below

In [3]:  infile =open(easygui.fileopenbox(),"r") pledge = read_csv(infile,parse_dates='true')   --------------------------------------------------------------------------- ValueError                                Traceback (most recent call last) <ipython-input-3-b35e7a16b389> in <module>()       1 infile =open(easygui.fileopenbox(),"r")       2  ----> 3 pledge = read_csv(infile,parse_dates='true')   C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in read_csv(filepath_or_buffer, sep, dialect, header, index_col, names, skiprows, na_values, thousands, comment, parse_dates, keep_date_col, dayfirst, date_parser, nrows, iterator, chunksize, skip_footer, converters, verbose, delimiter, encoding, squeeze)     234         kwds['delimiter'] = sep     235  --> 236     return _read(TextParser, filepath_or_buffer, kwds)     237      238 @Appender(_read_table_doc)  C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in _read(cls, filepath_or_buffer, kwds)     189         return parser     190  --> 191     return parser.get_chunk()     192      193 @Appender(_read_csv_doc)  C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in get_chunk(self, rows)     779             msg = ('Expecting %d columns, got %d in row %d' %     780                    (col_len, zip_len, row_num)) --> 781             raise ValueError(msg)     782      783         data = dict((k, v) for k, v in izip(self.columns, zipped_content))  ValueError: Expecting 23 columns, got 26 in row 64

374

asked Nov 22 '13 20:11

chrisfs

2 Answers

You can use names parameter. For example, if you have csv file like this:

1,2,1 2,3,4,2,3 1,2,3,3 1,2,3,4,5,6

And try to read it, you'll receive and error

>>> pd.read_csv(r'D:/Temp/tt.csv') Traceback (most recent call last): ... Expected 5 fields in line 4, saw 6

But if you pass names parameters, you'll get result:

>>> pd.read_csv(r'D:/Temp/tt.csv', names=list('abcdef'))    a  b  c   d   e   f 0  1  2  1 NaN NaN NaN 1  2  3  4   2   3 NaN 2  1  2  3   3 NaN NaN 3  1  2  3   4   5   6

Hope it helps.

answered Oct 09 '22 07:10

Roman Pekar

you can also load the CSV with separator '^', to load the entire string to a column, then use split to break the string into required delimiters. After that, you do a concat to merge with the original dataframe (if needed).

temp=pd.read_csv('test.csv',sep='^',header=None,prefix='X') temp2=temp.X0.str.split(',',expand=True) del temp['X0'] temp=pd.concat([temp,temp2],axis=1)

answered Oct 09 '22 09:10

arjepak

Related questions
                            
                                Parent instance is not bound to a Session; lazy load operation of attribute ’account’ cannot proceed
                            
                                Display python unittest results in nice, tabular form [closed]
                            
                                ImportError: No module named jinja2
                            
                                Why is the range object "not an iterator"? [duplicate]
                            
                                A faster alternative to Pandas `isin` function
                            
                                QLayout: Attempting to add QLayout "" to QWidget "", which already has a layout
                            
                                copy data from csv to postgresql using python
                            
                                Choosing from different cost function and activation function of a neural network
                            
                                How to use numpy in optional typing
                            
                                What does 'index 0 is out of bounds for axis 0 with size 0' mean?
                            
                                Using Smote with Gridsearchcv in Scikit-learn
                            
                                how do simple SQLAlchemy relationships work?
                            
                                Import C++ function into Python program
                            
                                Fullscreen with pyqt4?
                            
                                Reading a line from standard input in Python
                            
                                A super strange bug of os.path.abspath
                            
                                append two data frame with pandas
                            
                                how do I use empty namespaces in an lxml xpath query?
                            
                                Embedding a matplotlib figure inside a WxPython panel
                            
                                Subprocess.Popen: cloning stdout and stderr both to terminal and variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas read_csv expects wrong number of columns, with ragged csv file

Tags:

python

pandas

csv

ragged

chrisfs

People also ask

2 Answers

Roman Pekar

arjepak

Recent Activity

Donate For Us