Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting percentage and count Python

Suppoose df.bun (df is a Pandas dataframe)is a multi-index(date and name) with variable being category values written in string,

date      name             values
20170331  A122630          stock-a
          A123320          stock-a
          A152500          stock-b
          A167860          bond
          A196030          stock-a
          A196220          stock-a
          A204420          stock-a
          A204450          curncy-US
          A204480          raw-material
          A219900          stock-a

How can I make this to represent total counts in the same date and its percentage to make table like below with each of its date,

date           variable    counts     Percentage
20170331          stock         7           70%
                   bond         1           10%
           raw-material         1           10%
                 curncy         1           10%

I have done print(df.groupby('bun').count()) as a resort to this question but it lacks..

cf) Before getting df.bun I used the following code to import nested dictionary to Pandas dataframe.

import numpy as np
import pandas as pd

result = pd.DataFrame()
origDict = np.load("Hannah Lee.npy")
for item in range(len(origDict)):
    newdict = {(k1, k2):v2 for k1,v1 in origDict[item].items() for k2,v2 in origDict[item][k1].items()}
    df = pd.DataFrame([newdict[i] for i in sorted(newdict)],
                      index=pd.MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
    print(df.bun)
like image 210
Hannah Lee Avatar asked May 04 '18 07:05

Hannah Lee


People also ask

How do you calculate percentages in Python?

Use the division / operator to divide one number by another. Multiply the quotient by 100 to get the percentage. The result shows what percent the first number is of the second.

How do you calculate the percentage of each element in a list Python?

In this, we construct positive elements list using list comprehension and then compute the length of lists using len(), both lengths are divided and multiplied by 100 to get percentage count.

How do you get percentage in pandas?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

What does the Value_counts () method produce?

value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.


1 Answers

I believe need SeriesGroupBy.value_counts:

g = df.groupby('date')['values']
df = pd.concat([g.value_counts(), 
                g.value_counts(normalize=True).mul(100)],axis=1, keys=('counts','percentage'))
print (df)
                       counts  percentage
date     values                          
20170331 stock-a            6        60.0
         bond               1        10.0
         curncy-US          1        10.0
         raw-material       1        10.0
         stock-b            1        10.0

Another solution with size for counts and then divide by new Series created by transform and sum:

df2 = df.reset_index().groupby(['date', 'values']).size().to_frame('count')
df2['percentage'] = df2['count'].div(df2.groupby('date')['count'].transform('sum')).mul(100)
print (df2)
                       count  percentage
date     values                         
20170331 bond              1        10.0
         curncy-US         1        10.0
         raw-material      1        10.0
         stock-a           6        60.0
         stock-b           1        10.0

Difference between solutions is first sort by values per groups and second sort MultiIndex.

like image 102
jezrael Avatar answered Nov 08 '22 18:11

jezrael