Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_excel sometimes creates index even when index_col=None

I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.

By default (index_col=None), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.

Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?

This works as expected when test1.xlsx has the value "DATE" in cell A1:

In [19]: pd.read_excel('test1.xlsx')                                             
Out[19]: 
                 DATE         A         B         C
0 2018-01-01 00:00:00  0.766895  1.142639  0.810603
1 2018-01-01 01:00:00  0.605812  0.890286  0.810603
2 2018-01-01 02:00:00  0.623123  1.053022  0.810603
3 2018-01-01 03:00:00  0.740577  1.505082  0.810603
4 2018-01-01 04:00:00  0.335573 -0.024649  0.810603

But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:

In [20]: pd.read_excel('test2.xlsx', index_col=None)                             
Out[20]: 
                            A         B         C
2018-01-01 00:00:00  0.766895  1.142639  0.810603
2018-01-01 01:00:00  0.605812  0.890286  0.810603
2018-01-01 02:00:00  0.623123  1.053022  0.810603
2018-01-01 03:00:00  0.740577  1.505082  0.810603
2018-01-01 04:00:00  0.335573 -0.024649  0.810603

This is not what I want.

Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).

Documentation says

index_col : int, list of int, default None.

Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.

like image 554
Bill Avatar asked Feb 01 '19 22:02

Bill


People also ask

How to set column as index in pandas Dataframe?

Use the index_col Parameter in read_excel or read_csv to Set Column as the Index in Pandas DataFrame While reading a dataframe from an excel or CSV file, we can specify the column which we want as the index of the DataFrame.

How to read specific sheet names from an Excel file into pandas?

In this case you can use index_col to tell pandas which column to use as the index column when importing: You can also read specific sheet names from an Excel file into a pandas DataFrame. For example, consider the following Excel file:

Why does read_Excel () give the same results for multi-index column files?

Sign in to your account read_excel () seems to give the same results for a multi-index column file if index_col=None or index_col=0. Using the file from /pandas/test/io/data: This behavior is due to pandas.io.excel._pop_header_name (). index_col is None pulls out the first column and then treats it as an index.

How do I import an Excel file into a pandas Dataframe?

The following code shows how to use the read_excel () function to import this Excel file into a pandas DataFrame: Sometimes you may also have an Excel file in which one of the columns is an index column: In this case you can use index_col to tell pandas which column to use as the index column when importing:


2 Answers

The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:

Bug Fixes

  • Bug in read_excel() in which index_col=None was not being respected and parsing index columns anyway (GH18792, GH20480)
like image 69
Xukrao Avatar answered Oct 03 '22 21:10

Xukrao


You can also use

index_col=0

instead of

index_col = None
like image 22
jabbiez Avatar answered Oct 03 '22 20:10

jabbiez