I'm trying to read large data (thousands of rows) through a python script from csv files which look like this:
.....
2015-11-03 20:16:28,000;63,62;
2015-11-03 20:16:29,000;63,75;
2015-11-03 20:16:30,000;63,86;
2015-11-03 20:16:31,000;64,25;
but it appears that one of the files has extra empty rows that have 196541465 blank spaces — then the code crashes when reading it with read_csv of pandas lib.
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 4221, in append
elif isinstance(other, list) and not isinstance(other[0], DataFrame):
IndexError: list index out of range
I'm using the folowing command:
data = pd.read_csv(input_file,skiprows = [0],usecols=[0,1,2],delimiter=';',decimal=',', names = [ 'date','angle','Unnamed'],na_filter = False,parse_dates = [0],date_parser = reformat_date,error_bad_lines = False,skip_blank_lines=True)#,nrows = 8191)
the culprit row is the 8192'th, when limiting rows (by rows = 8191
) it works just fine. I've tried many options from the doc but it doesn't seem to work! Any idea?
You'll get the Indexerror: list index out of range error when you try and access an item using a value that is out of the index range of the list and does not exist. This is quite common when you try to access the last item of a list, or the first one if you're using negative indexing.
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.
I got this error because I was trying to read a CSV file that had too few headers vs. the number of columns (e.g. 10 columns, but only 8 headers. If you set index_col=False
, pandas doesn't know what to do with the extra columns)
Edited according to Mitjas comment below.
I just had the same issue and index_col = False
didn't work. I had 19 columns and only 17 headers. Solved it with reading columns and headers separately and then adding the header names.
dfcolumns = pd.read_csv('file.csv',
nrows = 1)
df = pd.read_csv('file.csv',
header = None,
skiprows = 1,
usecols = list(range(len(dfcolumns.columns))),
names = dfcolumns.columns)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With