I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.
By default (index_col=None
), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.
Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?
This works as expected when test1.xlsx has the value "DATE" in cell A1:
In [19]: pd.read_excel('test1.xlsx')
Out[19]:
DATE A B C
0 2018-01-01 00:00:00 0.766895 1.142639 0.810603
1 2018-01-01 01:00:00 0.605812 0.890286 0.810603
2 2018-01-01 02:00:00 0.623123 1.053022 0.810603
3 2018-01-01 03:00:00 0.740577 1.505082 0.810603
4 2018-01-01 04:00:00 0.335573 -0.024649 0.810603
But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:
In [20]: pd.read_excel('test2.xlsx', index_col=None)
Out[20]:
A B C
2018-01-01 00:00:00 0.766895 1.142639 0.810603
2018-01-01 01:00:00 0.605812 0.890286 0.810603
2018-01-01 02:00:00 0.623123 1.053022 0.810603
2018-01-01 03:00:00 0.740577 1.505082 0.810603
2018-01-01 04:00:00 0.335573 -0.024649 0.810603
This is not what I want.
Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).
Documentation says
index_col : int, list of int, default None.
Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.
Use the index_col Parameter in read_excel or read_csv to Set Column as the Index in Pandas DataFrame While reading a dataframe from an excel or CSV file, we can specify the column which we want as the index of the DataFrame.
In this case you can use index_col to tell pandas which column to use as the index column when importing: You can also read specific sheet names from an Excel file into a pandas DataFrame. For example, consider the following Excel file:
Sign in to your account read_excel () seems to give the same results for a multi-index column file if index_col=None or index_col=0. Using the file from /pandas/test/io/data: This behavior is due to pandas.io.excel._pop_header_name (). index_col is None pulls out the first column and then treats it as an index.
The following code shows how to use the read_excel () function to import this Excel file into a pandas DataFrame: Sometimes you may also have an Excel file in which one of the columns is an index column: In this case you can use index_col to tell pandas which column to use as the index column when importing:
The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:
Bug Fixes
- Bug in read_excel() in which
index_col=None
was not being respected and parsing index columns anyway (GH18792, GH20480)
You can also use
index_col=0
instead of
index_col = None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With