What are levels in a pandas DataFrame?

Tags:

I've been reading through the documentation and many explanations and examples use levels as something taken for granted. Imho the docs lack a bit on a fundamental explanation of the data structure and definitions.

What are levels in a data frame? What are levels in a MultiIndex index?

575

asked Jul 21 '17 11:07

Raf

1 Answers

I stumbled across this question while analyzing the answer to my own question, but I didn't find the John's answer satisfying enough. After a few experiments though I think I understood the levels and decided to share:

Short answer:

Levels are parts of the index or column.

Long answer:

I think this multi-column DataFrame.groupby example illustrates the index levels quite nicely.

Let's say we have the time logged on issues report data:

report = pd.DataFrame([         [1, 10, 'John'],         [1, 20, 'John'],         [1, 30, 'Tom'],         [1, 10, 'Bob'],         [2, 25, 'John'],         [2, 15, 'Bob']], columns = ['IssueKey','TimeSpent','User'])     IssueKey  TimeSpent  User 0         1         10  John 1         1         20  John 2         1         30   Tom 3         1         10   Bob 4         2         25  John 5         2         15   Bob

The index here has only 1 level (there is only one index value identifying every row). The index is artificial (running number) and consists of values form 0 to 5.

Say we want to merge (sum) all logs created by the same user to the same issue (to get the total time spent on the issue by the user)

time_logged_by_user = report.groupby(['IssueKey', 'User']).TimeSpent.sum()  IssueKey  User 1         Bob     10           John    30           Tom     30 2         Bob     15           John    25

Now our data index has 2 levels, as multiple users logged time to the same issue. The levels are IssueKey and User. The levels are parts of the index (only together they can identify a row in a DataFrame / Series).

Levels being parts of the index (as a tuple) can be nicely observed in the Spyder Variable explorer:

enter image description here

Having levels gives us opportunity to aggregate values within groups in respect to an index part (level) of our choice. E.g. if we want to assign the max time spent on an issue by any user, we can:

max_time_logged_to_an_issue = time_logged_by_user.groupby(level='IssueKey').transform('max')  IssueKey  User 1         Bob     30           John    30           Tom     30 2         Bob     25           John    25

Now the first 3 rows have the value 30, as they correspond to the issue 1 (User level was ignored in the code above). The same story for the issue 2.

This can be useful e.g. if we want to find out which users spent most time on every issue:

issue_owners = time_logged_by_user[time_logged_by_user == max_time_logged_to_an_issue]  IssueKey  User 1         John    30           Tom     30 2         John    25

191

answered Oct 02 '22 02:10

Andrzej Gis

Related questions
                            
                                Angular - unit test for a subscribe function in a component
                            
                                Angular 4. A series of http requests get cancelled
                            
                                How to use the selected period of time in a query?
                            
                                Invoke AWS Lambda function only once, at a single specified future time
                            
                                Subtract two columns in dataframe
                            
                                Angular Material Table filterPredicate
                            
                                Which JRE does C:\ProgramData\Oracle\Java\javapath\java.exe use?
                            
                                How to remove index signature using mapped types
                            
                                How to apply layer-wise learning rate in Pytorch?
                            
                                Crashlytics vs Fabric vs Firebase Crash Reporting — I'm lost
                            
                                horizontal grid only (in python using pandas plot + pyplot)
                            
                                Intellisense not working in code snippets - VS Code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are levels in a pandas DataFrame?

Tags:

Raf

People also ask

1 Answers

Andrzej Gis

Recent Activity

Donate For Us