Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas returns "Passed header names mismatches usecols" error

Tags:

python

pandas

The following works as expected. There are 190 columns that are all read in perfectly.

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             # usecols=columns[:10], 
             nrows=10
             )

I have used the usecols argument before, so I am perplexed as to why this is no longer working for me. I would guess that simply slicing the first 10 column names would trivially work, but I continue to get the "Passed header names mismatches usecols" error.

I am using pandas 0.16.2.

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             usecols=columns[:10], 
             nrows=10
             )

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44> in <module>()
      3                     nrows=10,
      4                     header=None,
----> 5                     names=columns,
      6                     )

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    248 
    249     # Create the parser.
--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)
    251 
    252     if (nrows is not None) and (chunksize is not None):

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    564             self.options['has_index_names'] = kwds['has_index_names']
    565 
--> 566         self._make_engine(self.engine)
    567 
    568     def _get_options_with_defaults(self, engine):

/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    703     def _make_engine(self, engine='c'):
    704         if engine == 'c':
--> 705             self._engine = CParserWrapper(self.f, **self.options)
    706         else:
    707             if engine == 'python':

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1070         kwds['allow_leading_cols'] = self.index_col is not False
   1071 
-> 1072         self._reader = _parser.TextReader(src, **kwds)
   1073 
   1074         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()

pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()

ValueError: Passed header names mismatches usecols
like image 781
Jason Sanchez Avatar asked Jun 24 '15 04:06

Jason Sanchez


1 Answers

It turns out there were 191 columns in the dataset (not 190). Pandas automatically set my first column of data as the index. I don't quite know why it caused it to error out since all of the columns in usecols were in fact present in the parsed in dataset.

So, the solution is to confirm that the number of columns in names exactly corresponds to the number of columns in your dataset.

Also, I found this discussion on GitHub.

like image 70
Jason Sanchez Avatar answered Sep 24 '22 01:09

Jason Sanchez