Error in Reading a csv file in pandas[CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.]

Tags:

So i tried reading all the csv files from a folder and then concatenate them to create a big csv(structure of all the files was same), save it and read it again. All this was done using Pandas. The Error occurs while reading. I am Attaching the code and the Error below.

import pandas as pd import numpy as np import glob  path =r'somePath' # use your path allFiles = glob.glob(path + "/*.csv") frame = pd.DataFrame() list_ = [] for file_ in allFiles:     df = pd.read_csv(file_,index_col=None, header=0)     list_.append(df) store = pd.concat(list_) store.to_csv("C:\work\DATA\Raw_data\\store.csv", sep=',', index= False) store1 = pd.read_csv("C:\work\DATA\Raw_data\\store.csv", sep=',')

Error:-

CParserError                              Traceback (most recent call last) <ipython-input-48-2983d97ccca6> in <module>() ----> 1 store1 = pd.read_csv("C:\work\DATA\Raw_data\\store.csv", sep=',')  C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)     472                     skip_blank_lines=skip_blank_lines)     473  --> 474         return _read(filepath_or_buffer, kwds)     475      476     parser_f.__name__ = name  C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)     258         return parser     259  --> 260     return parser.read()     261      262 _parser_defaults = {  C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in read(self, nrows)     719                 raise ValueError('skip_footer not supported for iteration')     720  --> 721         ret = self._engine.read(nrows)     722      723         if self.options.get('as_recarray'):  C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in read(self, nrows)    1168     1169         try: -> 1170             data = self._reader.read(nrows)    1171         except StopIteration:    1172             if nrows is None:  pandas\parser.pyx in pandas.parser.TextReader.read (pandas\parser.c:7544)()  pandas\parser.pyx in pandas.parser.TextReader._read_low_memory (pandas\parser.c:7784)()  pandas\parser.pyx in pandas.parser.TextReader._read_rows (pandas\parser.c:8401)()  pandas\parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8275)()  pandas\parser.pyx in pandas.parser.raise_parser_error (pandas\parser.c:20691)()  CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

I tried using csv reader as well:-

import csv with open("C:\work\DATA\Raw_data\\store.csv", 'rb') as f:     reader = csv.reader(f)     l = list(reader)

Error:-

Error                                     Traceback (most recent call last) <ipython-input-36-9249469f31a6> in <module>()       1 with open('C:\work\DATA\Raw_data\\store.csv', 'rb') as f:       2     reader = csv.reader(f) ----> 3     l = list(reader)  Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

247

asked Nov 30 '15 12:11

Arman Sharma

2 Answers

I found this error, the cause was that there were some carriage returns "\r" in the data that pandas was using as a line terminator as if it was "\n". I thought I'd post here as that might be a common reason this error might come up.

The solution I found was to add lineterminator='\n' into the read_csv function like this:

df_clean = pd.read_csv('test_error.csv',                  lineterminator='\n')

154

answered Sep 20 '22 22:09

Louise Fallon

If you are using python and its a big file you may use engine='python' as below and should work.

df = pd.read_csv( file_, index_col=None, header=0, engine='python' )

answered Sep 21 '22 22:09

Firas Aswad

Related questions
                            
                                Get .wav file length or duration
                            
                                Good ways to "expand" a numpy ndarray?
                            
                                Why do pythonistas call the current reference "self" and not "this"?
                            
                                Python argparse and controlling/overriding the exit status code
                            
                                Unable to import Python's email module at all
                            
                                is there a simple way to get group names of a user in django
                            
                                pip install - killed
                            
                                AttributeError: module 'cv2.cv2' has no attribute 'createLBPHFaceRecognizer'
                            
                                Is the += operator thread-safe in Python?
                            
                                How to stop Python parse_qs from parsing single values into lists?
                            
                                return SQL table as JSON in python
                            
                                What's the difference between "2*2" and "2**2" in Python?
                            
                                Debugger times out at "Collecting data..."
                            
                                Best way to retrieve variable values from a text file?
                            
                                How can I get href links from HTML using Python?
                            
                                Yield in a recursive function
                            
                                How to add Indian Standard Time (IST) in Django?
                            
                                Python json.loads fails with `ValueError: Invalid control character at: line 1 column 33 (char 33)`
                            
                                Can you monkey patch methods on core types in Python?
                            
                                Python socket receive - incoming packets always have a different size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Error in Reading a csv file in pandas[CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.]

Tags:

python

pandas

csv

Arman Sharma

People also ask

2 Answers

Louise Fallon

Firas Aswad

Recent Activity

Donate For Us