I have a csv file that has a few hundred rows and 26 columns, but the last few columns only have a value in a few rows and they are towards the middle or end of the file. When I try to read it in using read_csv() I get the following error. "ValueError: Expecting 23 columns, got 26 in row 64"
I can't see where to explicitly state the number of columns in the file, or how it determines how many columns it thinks the file should have. The dump is below
In [3]: infile =open(easygui.fileopenbox(),"r") pledge = read_csv(infile,parse_dates='true') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-b35e7a16b389> in <module>() 1 infile =open(easygui.fileopenbox(),"r") 2 ----> 3 pledge = read_csv(infile,parse_dates='true') C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in read_csv(filepath_or_buffer, sep, dialect, header, index_col, names, skiprows, na_values, thousands, comment, parse_dates, keep_date_col, dayfirst, date_parser, nrows, iterator, chunksize, skip_footer, converters, verbose, delimiter, encoding, squeeze) 234 kwds['delimiter'] = sep 235 --> 236 return _read(TextParser, filepath_or_buffer, kwds) 237 238 @Appender(_read_table_doc) C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in _read(cls, filepath_or_buffer, kwds) 189 return parser 190 --> 191 return parser.get_chunk() 192 193 @Appender(_read_csv_doc) C:\Python27\lib\site-packages\pandas-0.8.1-py2.7-win32.egg\pandas\io\parsers.pyc in get_chunk(self, rows) 779 msg = ('Expecting %d columns, got %d in row %d' % 780 (col_len, zip_len, row_num)) --> 781 raise ValueError(msg) 782 783 data = dict((k, v) for k, v in izip(self.columns, zipped_content)) ValueError: Expecting 23 columns, got 26 in row 64
Idea #2: Use CSVs rather than Excel FilesImporting csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster!
The default value of the sep parameter is the comma (,) which means if we don't specify the sep parameter in our read_csv() function, it is understood that our file is using comma as the delimiter.
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
You can use names
parameter. For example, if you have csv file like this:
1,2,1 2,3,4,2,3 1,2,3,3 1,2,3,4,5,6
And try to read it, you'll receive and error
>>> pd.read_csv(r'D:/Temp/tt.csv') Traceback (most recent call last): ... Expected 5 fields in line 4, saw 6
But if you pass names
parameters, you'll get result:
>>> pd.read_csv(r'D:/Temp/tt.csv', names=list('abcdef')) a b c d e f 0 1 2 1 NaN NaN NaN 1 2 3 4 2 3 NaN 2 1 2 3 3 NaN NaN 3 1 2 3 4 5 6
Hope it helps.
you can also load the CSV with separator '^', to load the entire string to a column, then use split to break the string into required delimiters. After that, you do a concat to merge with the original dataframe (if needed).
temp=pd.read_csv('test.csv',sep='^',header=None,prefix='X') temp2=temp.X0.str.split(',',expand=True) del temp['X0'] temp=pd.concat([temp,temp2],axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With