I have a csv file that has a few hundred rows and 26 columns, but the last few columns only have a value in a few rows and they are towards the middle or end of the file. When I try to read it in using read_csv() I get the following error. "ValueError: Expecting 23 columns, got 26 in row 64"
I can't see where to explicitly state the number of columns in the file, or how it determines how many columns it thinks the file should have. The dump is below
In [3]:  infile =open(easygui.fileopenbox(),"r") pledge = read_csv(infile,parse_dates='true')   --------------------------------------------------------------------------- ValueError                                Traceback (most recent call last) <ipython-input-3-b35e7a16b389> in <module>()       1 infile =open(easygui.fileopenbox(),"r")       2  ----> 3 pledge = read_csv(infile,parse_dates='true')   C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in read_csv(filepath_or_buffer, sep, dialect, header, index_col, names, skiprows, na_values, thousands, comment, parse_dates, keep_date_col, dayfirst, date_parser, nrows, iterator, chunksize, skip_footer, converters, verbose, delimiter, encoding, squeeze)     234         kwds['delimiter'] = sep     235  --> 236     return _read(TextParser, filepath_or_buffer, kwds)     237      238 @Appender(_read_table_doc)  C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in _read(cls, filepath_or_buffer, kwds)     189         return parser     190  --> 191     return parser.get_chunk()     192      193 @Appender(_read_csv_doc)  C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in get_chunk(self, rows)     779             msg = ('Expecting %d columns, got %d in row %d' %     780                    (col_len, zip_len, row_num)) --> 781             raise ValueError(msg)     782      783         data = dict((k, v) for k, v in izip(self.columns, zipped_content))  ValueError: Expecting 23 columns, got 26 in row 64 
                Idea #2: Use CSVs rather than Excel FilesImporting csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster!
The default value of the sep parameter is the comma (,) which means if we don't specify the sep parameter in our read_csv() function, it is understood that our file is using comma as the delimiter.
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
You can use names parameter. For example, if you have csv file like this:
1,2,1 2,3,4,2,3 1,2,3,3 1,2,3,4,5,6   And try to read it, you'll receive and error
>>> pd.read_csv(r'D:/Temp/tt.csv') Traceback (most recent call last): ... Expected 5 fields in line 4, saw 6   But if you pass names parameters, you'll get result:
>>> pd.read_csv(r'D:/Temp/tt.csv', names=list('abcdef'))    a  b  c   d   e   f 0  1  2  1 NaN NaN NaN 1  2  3  4   2   3 NaN 2  1  2  3   3 NaN NaN 3  1  2  3   4   5   6   Hope it helps.
you can also load the CSV with separator '^', to load the entire string to a column, then use split to break the string into required delimiters. After that, you do a concat to merge with the original dataframe (if needed).
temp=pd.read_csv('test.csv',sep='^',header=None,prefix='X') temp2=temp.X0.str.split(',',expand=True) del temp['X0'] temp=pd.concat([temp,temp2],axis=1) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With