Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How to find percentage of group members type per subgroup?

(Data sample and attempts at the end of the question)

With a dataframe such as this:

    Type    Class   Area    Decision
0   A       1       North   Yes
1   B       1       North   Yes
2   C       2       South   No
3   A       3       South   No
4   B       3       South   No
5   C       1       South   No
6   A       2       North   Yes
7   B       3       South   Yes
8   B       1       North   No
9   C       1       East    No
10  C       2       West    Yes 

How can I find what percentage of each type [A, B, C, D] that belongs to each area [North, South, East, West]?

Desired output:

    North   South   East    West
A   0.66    0.33    0       0
B   0.5     0.5     0       0
C   0       0.5     0.25    0.25

My best attempt so far is:

df_attempt1= df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T

Which returns:

Area  East  North  South  West
Type                          
A      NaN    2.0    1.0   NaN
B      NaN    2.0    2.0   NaN
C      1.0    NaN    2.0   1.0

And I guess I can build on that by calculating sums in the margins and appending 0 for missing observations, but I'd really appreciate suggestions for more elegant approaches.

Thank you for any suggestions!

Code:

import pandas as pd

df = pd.DataFrame(
    {
        "Type": {0: "A", 1: "B", 2: "C", 3: "A", 4: "B", 5: "C", 6: "A", 7: "B", 8: "B", 9: "C", 10: "C"},
        "Class": {0: 1, 1: 1, 2: 2, 3: 3, 4: 3, 5: 1, 6: 2, 7: 3, 8: 1, 9: 1, 10: 2},
        "Area": {0: "North", 1: "North", 2: "South", 3: "South", 4: "South", 5: "South", 6: "North", 7: "South", 8: "North", 9: "East", 10: "West"},
        "Decision": {0: "Yes", 1: "Yes", 2: "No", 3: "No", 4: "No", 5: "No", 6: "Yes", 7: "Yes", 8: "No", 9: "No", 10: "Yes"},
    }
)

dfg = df[['Area', 'Type']].groupby(['Area']).agg('count').unstack()

df_attempt1 = df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T
like image 281
vestland Avatar asked Jan 28 '20 10:01

vestland


People also ask

How do you calculate percentage in Groupby pandas?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

What is 75% in describe pandas?

75% - The 75% percentile*. max - the maximum value. *Percentile meaning: how many of the values are less than the given percentile.

What is the percentage of pandas?

A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes.

How do you get a group in a Groupby pandas?

By doing groupby() pandas returns you a dict of grouped DFs. You can easily get the key list of this dict by python built in function keys() .


2 Answers

You can use the function crosstab:

pd.crosstab(index=df['Type'], columns=df['Area'], normalize='index')

Output:

Area  East     North     South  West
Type                                
A     0.00  0.666667  0.333333  0.00
B     0.00  0.500000  0.500000  0.00
C     0.25  0.000000  0.500000  0.25
like image 120
Mykola Zotko Avatar answered Sep 16 '22 15:09

Mykola Zotko


You were quite close already. The following should do the trick:

df.groupby('Type')['Area'].value_counts(normalize = True).unstack(fill_value=0)

Output:

Area    East    North       South       West
Type                
A       0.00    0.666667    0.333333    0.00
B       0.00    0.500000    0.500000    0.00
C       0.25    0.000000    0.500000    0.25

If order matters, you can reorder the dataframe manipulating it's columns attribute

like image 34
Lukas Thaler Avatar answered Sep 17 '22 15:09

Lukas Thaler