The following works as expected. There are 190 columns that are all read in perfectly.
pd.read_csv("data.csv",
header=None,
names=columns,
# usecols=columns[:10],
nrows=10
)
I have used the usecols argument before, so I am perplexed as to why this is no longer working for me. I would guess that simply slicing the first 10 column names would trivially work, but I continue to get the "Passed header names mismatches usecols" error.
I am using pandas 0.16.2.
pd.read_csv("data.csv",
header=None,
names=columns,
usecols=columns[:10],
nrows=10
)
Traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-44> in <module>()
3 nrows=10,
4 header=None,
----> 5 names=columns,
6 )
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
472 skip_blank_lines=skip_blank_lines)
473
--> 474 return _read(filepath_or_buffer, kwds)
475
476 parser_f.__name__ = name
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
248
249 # Create the parser.
--> 250 parser = TextFileReader(filepath_or_buffer, **kwds)
251
252 if (nrows is not None) and (chunksize is not None):
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
564 self.options['has_index_names'] = kwds['has_index_names']
565
--> 566 self._make_engine(self.engine)
567
568 def _get_options_with_defaults(self, engine):
/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
703 def _make_engine(self, engine='c'):
704 if engine == 'c':
--> 705 self._engine = CParserWrapper(self.f, **self.options)
706 else:
707 if engine == 'python':
/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
1070 kwds['allow_leading_cols'] = self.index_col is not False
1071
-> 1072 self._reader = _parser.TextReader(src, **kwds)
1073
1074 # XXX
pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()
pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()
ValueError: Passed header names mismatches usecols
It turns out there were 191 columns in the dataset (not 190). Pandas automatically set my first column of data as the index. I don't quite know why it caused it to error out since all of the columns in usecols were in fact present in the parsed in dataset.
So, the solution is to confirm that the number of columns in names exactly corresponds to the number of columns in your dataset.
Also, I found this discussion on GitHub.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With