I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values. By default (<code>index_col=None</code>), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will. Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)? This works as expected when test1.xlsx has the value "DATE" in cell A1: <pre class="prettyprint"><code>In [19]: pd.read_excel('test1.xlsx') Out[19]: DATE A B C 0 2018-01-01 00:00:00 0.766895 1.142639 0.810603 1 2018-01-01 01:00:00 0.605812 0.890286 0.810603 2 2018-01-01 02:00:00 0.623123 1.053022 0.810603 3 2018-01-01 03:00:00 0.740577 1.505082 0.810603 4 2018-01-01 04:00:00 0.335573 -0.024649 0.810603 </code></pre> But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index: <pre class="prettyprint"><code>In [20]: pd.read_excel('test2.xlsx', index_col=None) Out[20]: A B C 2018-01-01 00:00:00 0.766895 1.142639 0.810603 2018-01-01 01:00:00 0.605812 0.890286 0.810603 2018-01-01 02:00:00 0.623123 1.053022 0.810603 2018-01-01 03:00:00 0.740577 1.505082 0.810603 2018-01-01 04:00:00 0.335573 -0.024649 0.810603 </code></pre> This is not what I want. Desired result: Same as first example (but with 'Unnamed' as the column label perhaps). Documentation says <blockquote> index_col : int, list of int, default None. Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column. </blockquote>

The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release: <blockquote> Bug Fixes <ul> <li>Bug in read_excel() in which <code>index_col=None</code> was not being respected and parsing index columns anyway (GH18792, GH20480)</li> </ul> </blockquote>

You can also use <pre class="prettyprint"><code>index_col=0 </code></pre> instead of <pre class="prettyprint"><code>index_col = None </code></pre>

Pandas read_excel sometimes creates index even when index_col=None

Tags:

python

indexing

pandas

dataframe

excel

I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.

By default (index_col=None), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.

Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?

This works as expected when test1.xlsx has the value "DATE" in cell A1:

In [19]: pd.read_excel('test1.xlsx')                                             
Out[19]: 
                 DATE         A         B         C
0 2018-01-01 00:00:00  0.766895  1.142639  0.810603
1 2018-01-01 01:00:00  0.605812  0.890286  0.810603
2 2018-01-01 02:00:00  0.623123  1.053022  0.810603
3 2018-01-01 03:00:00  0.740577  1.505082  0.810603
4 2018-01-01 04:00:00  0.335573 -0.024649  0.810603

But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:

In [20]: pd.read_excel('test2.xlsx', index_col=None)                             
Out[20]: 
                            A         B         C
2018-01-01 00:00:00  0.766895  1.142639  0.810603
2018-01-01 01:00:00  0.605812  0.890286  0.810603
2018-01-01 02:00:00  0.623123  1.053022  0.810603
2018-01-01 03:00:00  0.740577  1.505082  0.810603
2018-01-01 04:00:00  0.335573 -0.024649  0.810603

This is not what I want.

Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).

Documentation says

index_col : int, list of int, default None.

Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.

554

asked Feb 01 '19 22:02

Bill

2 Answers

The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:

Bug Fixes

Bug in read_excel() in which index_col=None was not being respected and parsing index columns anyway (GH18792, GH20480)

answered Oct 03 '22 21:10

Xukrao

You can also use

index_col=0

instead of

index_col = None

answered Oct 03 '22 20:10

jabbiez

Related questions
                            
                                Is there .all() or .any() equivalent in python Tensorflow
                            
                                Why can't you replace integers with lists using `replace` method - pandas
                            
                                How to mask weights in PyTorch weight parameters?
                            
                                whats does assert _sre.MAGIC == MAGIC, SRE module mismatch AssertionError: SRE module mismatch error mean?
                            
                                Multiple outputs in keras Sequential models
                            
                                Pandas .at throwing ValueError: At based indexing on an integer index can only have integer indexers
                            
                                Odoo - Custom template menu load
                            
                                What is variable shadowing?
                            
                                How to avoid decoding to str: need a bytes-like object error in pandas?
                            
                                Rename named parameter in Python to avoid naming conflicts with import statement
                            
                                DJango filter_queryset
                            
                                Seaborn stacked histogram/barplot
                            
                                Indexing the max elements in a multidimensional tensor in PyTorch
                            
                                How to experiment with custom 2d-convolution kernels in Keras?
                            
                                Resolving circular dependencies in a python/django application
                            
                                How to set ticks after log scale in Seaborn FacetGrid?
                            
                                I get `No module named _multiarray_umath` when using matplotlib
                            
                                How can I use the gluon-cv model_zoo and output to an OpenCV window with Python?
                            
                                Improve min/max downsampling
                            
                                uwsgi master graceful shutdown

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With