Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas pivot_table group column by values

I am trying to use numeric values as columns on a Pandas pivot_table. The problem is that since each number is mostly unique, the resulting pivot_table isn't very useful as a way to aggregate my data.

Here is what I have so far (fake data example):

import pandas as pd   

df = pd.DataFrame({'Country': ['US', 'Brazil', 'France', 'Germany'], 
                       'Continent': ['Americas', 'Americas', 'Europe', 'Europe'], 
                       'Population': [321, 207, 80, 66]})


pd.pivot_table(df, index='Continent', columns='Population', aggfunc='count')

Here is an image of the resulting pivot_table.

How could I group my values into ranges based on my columns?

In other words, how can I count all countries with Population... <100, 100-200, >300?

like image 268
Bruno Vieira Avatar asked Jun 07 '17 18:06

Bruno Vieira


People also ask

What's the difference between pivot_table () and Groupby ()?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

How do I group values in a column in Python?

groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to . groupby() as the first argument.

How is pivot_table () different from pivot () when both perform pivoting?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation.


1 Answers

Use pd.cut:

df = df.assign(PopGroup=pd.cut(df.Population,bins=[0,100,200,300,np.inf],labels=['<100','100-200','200-300','>300']))

Output:

  Continent  Country  Population PopGroup
0  Americas       US         321     >300
1  Americas   Brazil         207  200-300
2    Europe   France          80     <100
3    Europe  Germany          66     <100

pd.pivot_table(df, index='Continent', columns='PopGroup',values=['Country'], aggfunc='count')

Output:

        Country          
PopGroup  200-300 <100 >300
Continent                  
Americas      1.0  NaN  1.0
Europe        NaN  2.0  NaN
like image 200
Scott Boston Avatar answered Sep 20 '22 00:09

Scott Boston