I want to read a list of CSVs into a dataframe. However, I'm having trouble catching an error that occurs when the file has header rows that do not match the data itself (i.e. metadata or additional blank rows). This error is a 'CParserError' (see my error messages at the bottom).
My current solution is to use a try-except statement, with
try:
#read file
except CParserError:
#give me an error message
However, this fails with the below error:
NameError: name 'CParserError' is not defined
My code is below. As you can see I think I require multiple except statements to catch the various errors. The first should check that the default encoding types work (the files will never be anything other than utf-8 or latin-1). If there are header rows, pd.read_csv gives a 'CParserError' message (see below) which I need to catch. Then, if there are any other miscellaneous issues I want to catch those too.
Any solutions welcome, that ideally would explain why CParserError isn't right, or if the try-except logic could be amended to avoid the reliance on this.
Thanks.
files_list = glob.glob('*.csv*') #get all csvs
files_dict = {}
for file in files_list:
try:
files_dict[file] = pd.read_csv('DFA_me_week27.csv', encoding='utf-8').read()
except UnicodeDecodeError:
files_dict[file] = pd.read_csv('DFA_me_week27.csv', encoding='Latin-1').read()
except CParserError:
print(file, 'failed: check for header rows')
except:
print(file, 'failed: some other error occurred')
The error message when trying to parse a CSV file with headers:
CParserError Traceback (most recent call last)
<ipython-input-15-e454c053d675> in <module>()
----> 1 pd.read_csv('DFA_me_week27.csv')
C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
463 skip_blank_lines=skip_blank_lines)
464
--> 465 return _read(filepath_or_buffer, kwds)
466
467 parser_f.__name__ = name
C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
249 return parser
250
--> 251 return parser.read()
252
253 _parser_defaults = {
C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
708 raise ValueError('skip_footer not supported for iteration')
709
--> 710 ret = self._engine.read(nrows)
711
712 if self.options.get('as_recarray'):
C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1157
1158 try:
-> 1159 data = self._reader.read(nrows)
1160 except StopIteration:
1161 if nrows is None:
pandas\parser.pyx in pandas.parser.TextReader.read (pandas\parser.c:7403)()
pandas\parser.pyx in pandas.parser.TextReader._read_low_memory (pandas\parser.c:7643)()
pandas\parser.pyx in pandas.parser.TextReader._read_rows (pandas\parser.c:8260)()
pandas\parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8134)()
pandas\parser.pyx in pandas.parser.raise_parser_error (pandas\parser.c:20720)()
CParserError: Error tokenizing data. C error: Expected 2 fields in line 12, saw 12
I use
from pandas.parser import CParserError
And I got
FutureWarning: The pandas.parser module is deprecated and will be removed in a future version. Please import from the pandas.io.parser instead
So
from pandas.io.parser import CParserError
is recommended.
I'm using Python 3.6, and my pandas version is 0.20.3
However, when I use from pandas.io.parser import CParserError
I got
ModuleNotFoundError: No module named 'pandas.io.parser'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With