Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with set_index in Pandas DataFrame

Tags:

python

pandas

Using an imported CSV file, I indexed the DataFrame like this...

 rdata.set_index(['race_date', 'track_code', 'race_number', 'horse_name'])

This is what a section of the DataFrame looks like...

 race_date  track_code race_number horse_name          work_date  work_track
 2007-08-24 BM         8           Count Me Twice     2007-05-31         PLN
                                   Count Me Twice     2007-06-09         PLN
                                   Count Me Twice     2007-06-16         PLN
                                   Count Me Twice     2007-06-23         PLN
                                   Count Me Twice     2007-08-05         PLN
                                   Judge's Choice     2007-06-07          BM
                                   Judge's Choice     2007-06-14          BM
                                   Judge's Choice     2007-07-08          BM
                                   Judge's Choice     2007-08-18          BM

Why isn't the 'horse_name' column being grouped like the date, track and race? Perhaps it's by design, thus how can I slice this larger DataFrame by race to have a new DataFrame with 'horse_name' as its index?

like image 778
TravisVOX Avatar asked Aug 06 '13 03:08

TravisVOX


People also ask

What does set_index do in pandas?

Pandas set_index() is a method to set a List, Series or Data frame as index of a Data Frame. Index column can be set while making a data frame too. But sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method.

Can we change index in DataFrame?

To change the index values we need to use the set_index method which is available in pandas allows specifying the indexes. where, inplace parameter accepts True or False, which specifies that change in index is permanent or temporary. True indicates that change is Permanent.

How do you set index false in DataFrame?

pandas DataFrame to CSV with no index can be done by using index=False param of to_csv() method. With this, you can specify ignore index while writing/exporting DataFrame to CSV file.

How do you set an index on a existing data frame?

Set index by keeping old indexset_index() is used to set a new index to the DataFrame. It is also used to extend the existing DataFrame, i.e., we can update the index by append to the existing index. We need to use the append parameter of the DataFrame. set_index() function to append the new index to the existing one.


1 Answers

It's not a bug. This is exactly how it's intended to work.

DataFrame has to show show every single item in it's data. So if the index has one level, that level will be fully expanded. If it has two levels, first level will be grouped and the second will be fully expanded, if it has tree levels, first two will be grouped and the third will be expanded, and so on.

So this is why the horse name is not grouped. How would you be able to see all the items in the DataFrame if you group also by the horse name :)

Try doing:

 rdata.set_index(['race_date', 'track_code', 'race_number'])

or:

 rdata.set_index(['race_date', 'track_code'])

You'll see that the last level of the index is always fully expanded, to enable you to see all the items in the DataFrame.

like image 157
Viktor Kerkez Avatar answered Oct 01 '22 05:10

Viktor Kerkez