Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List index out of range with Panda read_csv

Tags:

python

pandas

csv

I'm trying to read large data (thousands of rows) through a python script from csv files which look like this:

.....
2015-11-03 20:16:28,000;63,62;
2015-11-03 20:16:29,000;63,75;
2015-11-03 20:16:30,000;63,86;
2015-11-03 20:16:31,000;64,25;

but it appears that one of the files has extra empty rows that have 196541465 blank spaces — then the code crashes when reading it with read_csv of pandas lib.

     File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 4221, in append
        elif isinstance(other, list) and not isinstance(other[0], DataFrame):
IndexError: list index out of range

I'm using the folowing command:

data = pd.read_csv(input_file,skiprows = [0],usecols=[0,1,2],delimiter=';',decimal=',', names = [ 'date','angle','Unnamed'],na_filter = False,parse_dates = [0],date_parser = reformat_date,error_bad_lines = False,skip_blank_lines=True)#,nrows = 8191)

the culprit row is the 8192'th, when limiting rows (by rows = 8191) it works just fine. I've tried many options from the doc but it doesn't seem to work! Any idea?

like image 273
Nero Ouali Avatar asked Jun 22 '16 09:06

Nero Ouali


People also ask

What is list index out of range in Python?

You'll get the Indexerror: list index out of range error when you try and access an item using a value that is out of the index range of the list and does not exist. This is quite common when you try to access the last item of a list, or the first one if you're using negative indexing.

What does .index do in pandas?

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.


Video Answer


2 Answers

I got this error because I was trying to read a CSV file that had too few headers vs. the number of columns (e.g. 10 columns, but only 8 headers. If you set index_col=False, pandas doesn't know what to do with the extra columns)

like image 173
rogueleaderr Avatar answered Sep 24 '22 03:09

rogueleaderr


Edited according to Mitjas comment below.

I just had the same issue and index_col = False didn't work. I had 19 columns and only 17 headers. Solved it with reading columns and headers separately and then adding the header names.

dfcolumns = pd.read_csv('file.csv',
                        nrows = 1)
df = pd.read_csv('file.csv',
                  header = None,
                  skiprows = 1,
                  usecols = list(range(len(dfcolumns.columns))),
                  names = dfcolumns.columns)
like image 22
Marcus Högenå Bohman Avatar answered Sep 22 '22 03:09

Marcus Högenå Bohman