Suppoose df.bun
(df is a Pandas dataframe)is a multi-index(date and name) with variable being category values written in string,
date name values
20170331 A122630 stock-a
A123320 stock-a
A152500 stock-b
A167860 bond
A196030 stock-a
A196220 stock-a
A204420 stock-a
A204450 curncy-US
A204480 raw-material
A219900 stock-a
How can I make this to represent total counts in the same date and its percentage to make table like below with each of its date,
date variable counts Percentage
20170331 stock 7 70%
bond 1 10%
raw-material 1 10%
curncy 1 10%
I have done print(df.groupby('bun').count())
as a resort to this question but it lacks..
cf) Before getting df.bun I used the following code to import nested dictionary to Pandas dataframe.
import numpy as np
import pandas as pd
result = pd.DataFrame()
origDict = np.load("Hannah Lee.npy")
for item in range(len(origDict)):
newdict = {(k1, k2):v2 for k1,v1 in origDict[item].items() for k2,v2 in origDict[item][k1].items()}
df = pd.DataFrame([newdict[i] for i in sorted(newdict)],
index=pd.MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
print(df.bun)
Use the division / operator to divide one number by another. Multiply the quotient by 100 to get the percentage. The result shows what percent the first number is of the second.
In this, we construct positive elements list using list comprehension and then compute the length of lists using len(), both lengths are divided and multiplied by 100 to get percentage count.
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
I believe need SeriesGroupBy.value_counts
:
g = df.groupby('date')['values']
df = pd.concat([g.value_counts(),
g.value_counts(normalize=True).mul(100)],axis=1, keys=('counts','percentage'))
print (df)
counts percentage
date values
20170331 stock-a 6 60.0
bond 1 10.0
curncy-US 1 10.0
raw-material 1 10.0
stock-b 1 10.0
Another solution with size
for counts and then divide by new Series
created by transform
and sum
:
df2 = df.reset_index().groupby(['date', 'values']).size().to_frame('count')
df2['percentage'] = df2['count'].div(df2.groupby('date')['count'].transform('sum')).mul(100)
print (df2)
count percentage
date values
20170331 bond 1 10.0
curncy-US 1 10.0
raw-material 1 10.0
stock-a 6 60.0
stock-b 1 10.0
Difference between solutions is first sort by values per groups and second sort MultiIndex
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With