(Data sample and attempts at the end of the question)
With a dataframe such as this:
Type Class Area Decision
0 A 1 North Yes
1 B 1 North Yes
2 C 2 South No
3 A 3 South No
4 B 3 South No
5 C 1 South No
6 A 2 North Yes
7 B 3 South Yes
8 B 1 North No
9 C 1 East No
10 C 2 West Yes
How can I find what percentage of each type [A, B, C, D]
that belongs to each area [North, South, East, West]
?
Desired output:
North South East West
A 0.66 0.33 0 0
B 0.5 0.5 0 0
C 0 0.5 0.25 0.25
My best attempt so far is:
df_attempt1= df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T
Which returns:
Area East North South West
Type
A NaN 2.0 1.0 NaN
B NaN 2.0 2.0 NaN
C 1.0 NaN 2.0 1.0
And I guess I can build on that by calculating sums in the margins and appending 0
for missing observations, but I'd really appreciate suggestions for more elegant approaches.
Thank you for any suggestions!
Code:
import pandas as pd
df = pd.DataFrame(
{
"Type": {0: "A", 1: "B", 2: "C", 3: "A", 4: "B", 5: "C", 6: "A", 7: "B", 8: "B", 9: "C", 10: "C"},
"Class": {0: 1, 1: 1, 2: 2, 3: 3, 4: 3, 5: 1, 6: 2, 7: 3, 8: 1, 9: 1, 10: 2},
"Area": {0: "North", 1: "North", 2: "South", 3: "South", 4: "South", 5: "South", 6: "North", 7: "South", 8: "North", 9: "East", 10: "West"},
"Decision": {0: "Yes", 1: "Yes", 2: "No", 3: "No", 4: "No", 5: "No", 6: "Yes", 7: "Yes", 8: "No", 9: "No", 10: "Yes"},
}
)
dfg = df[['Area', 'Type']].groupby(['Area']).agg('count').unstack()
df_attempt1 = df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
75% - The 75% percentile*. max - the maximum value. *Percentile meaning: how many of the values are less than the given percentile.
A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes.
By doing groupby() pandas returns you a dict of grouped DFs. You can easily get the key list of this dict by python built in function keys() .
You can use the function crosstab
:
pd.crosstab(index=df['Type'], columns=df['Area'], normalize='index')
Output:
Area East North South West
Type
A 0.00 0.666667 0.333333 0.00
B 0.00 0.500000 0.500000 0.00
C 0.25 0.000000 0.500000 0.25
You were quite close already. The following should do the trick:
df.groupby('Type')['Area'].value_counts(normalize = True).unstack(fill_value=0)
Output:
Area East North South West
Type
A 0.00 0.666667 0.333333 0.00
B 0.00 0.500000 0.500000 0.00
C 0.25 0.000000 0.500000 0.25
If order matters, you can reorder the dataframe manipulating it's columns attribute
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With