Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing a Pandas index like a regular column

I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this:

import pandas as pd, numpy as np  df=pd.DataFrame({'name':map(chr, range(97, 102)), 'id':range(10000,10005), 'value':np.random.randn(5)}) df.set_index('name', inplace=True) 

Here's the result:

         id     value name                  a     10000  0.659710 b     10001  1.001821 c     10002 -0.197576 d     10003 -0.569181 e     10004 -0.882097 

Now how am I allowed to go about accessing the name column?

print(df.index)  # No problem print(df['name'])  # KeyError: u'name' 

I know there are workaround like duplicating the column or changing the index to something else. But is there something cleaner, like some form of column access that treats the index the same way as everything else?

like image 369
kuzzooroo Avatar asked Sep 02 '18 17:09

kuzzooroo


People also ask

How do I turn a DataFrame index into a column?

In order to set index to column in pandas DataFrame use reset_index() method. By using this you can also set single, multiple indexes to a column. If you are not aware by default, pandas adds an index to each row of the pandas DataFrame.

How do I access Pandas index?

To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.

How do I select a column based on an index?

If you'd like to select columns based on integer indexing, you can use the . iloc function. If you'd like to select columns based on label indexing, you can use the . loc function.


1 Answers

Index has a special meaning in Pandas. It's used to optimise specific operations and can be used in various methods such as merging / joining data. Therefore, make a choice:

  • If it's "just another column", use reset_index and treat it as another column.
  • If it's genuinely used for indexing, keep it as an index and use df.index.

We can't make this choice for you. It should be dependent on the structure of your underlying data and on how you intend to analyse your data.

For more information on use of a dataframe index, see:

  • What is the performance impact of non-unique indexes in pandas?
  • What is the point of indexing in pandas?
like image 72
jpp Avatar answered Sep 18 '22 15:09

jpp