getting percentage and count Python

Tags:

Suppoose df.bun (df is a Pandas dataframe)is a multi-index(date and name) with variable being category values written in string,

Click to copy

date      name             values
20170331  A122630          stock-a
          A123320          stock-a
          A152500          stock-b
          A167860          bond
          A196030          stock-a
          A196220          stock-a
          A204420          stock-a
          A204450          curncy-US
          A204480          raw-material
          A219900          stock-a

How can I make this to represent total counts in the same date and its percentage to make table like below with each of its date,

Click to copy

date           variable    counts     Percentage
20170331          stock         7           70%
                   bond         1           10%
           raw-material         1           10%
                 curncy         1           10%

I have done print(df.groupby('bun').count()) as a resort to this question but it lacks..

cf) Before getting df.bun I used the following code to import nested dictionary to Pandas dataframe.

Click to copy

import numpy as np
import pandas as pd

result = pd.DataFrame()
origDict = np.load("Hannah Lee.npy")
for item in range(len(origDict)):
    newdict = {(k1, k2):v2 for k1,v1 in origDict[item].items() for k2,v2 in origDict[item][k1].items()}
    df = pd.DataFrame([newdict[i] for i in sorted(newdict)],
                      index=pd.MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
    print(df.bun)

210

asked May 04 '18 07:05

Hannah Lee

1 Answers

I believe need SeriesGroupBy.value_counts:

Click to copy

g = df.groupby('date')['values']
df = pd.concat([g.value_counts(), 
                g.value_counts(normalize=True).mul(100)],axis=1, keys=('counts','percentage'))
print (df)
                       counts  percentage
date     values                          
20170331 stock-a            6        60.0
         bond               1        10.0
         curncy-US          1        10.0
         raw-material       1        10.0
         stock-b            1        10.0

Another solution with size for counts and then divide by new Series created by transform and sum:

Click to copy

df2 = df.reset_index().groupby(['date', 'values']).size().to_frame('count')
df2['percentage'] = df2['count'].div(df2.groupby('date')['count'].transform('sum')).mul(100)
print (df2)
                       count  percentage
date     values                         
20170331 bond              1        10.0
         curncy-US         1        10.0
         raw-material      1        10.0
         stock-a           6        60.0
         stock-b           1        10.0

Difference between solutions is first sort by values per groups and second sort MultiIndex.

102

answered Nov 08 '22 18:11

jezrael

Related questions
                            
                                Python - TypeError: Can't mix strings and bytes in path components
                            
                                Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?
                            
                                De-spiking a non-periodic signal
                            
                                How to create new environment from a text file without environment name?
                            
                                What is dynamic dispatch and duck typing?
                            
                                LSTM in Pytorch
                            
                                Getting max values from pandas multiindex dataframe
                            
                                How to get around in place operation error if index leaf variable for gradient update?
                            
                                Replicating Jupyter Notebook Pandas dataframe HTML printout
                            
                                How are metrics computed in Keras?
                            
                                Efficient way of generating latin squares (or randomly permute numbers in matrix uniquely on both axes - using NumPy)
                            
                                Pandas: merge_asof-like solutions for merging two multi-indexed DataFrames?
                            
                                Keras LSTM Multiple Input Multiple Output
                            
                                How to use AsciiDoc with Python?
                            
                                train_test_split with multiple features
                            
                                Fill forms using selenium or requests
                            
                                Is there documentation for file object?
                            
                                How can I determine if the numbers in a list initially increase (or stay the same) and then decrease (or stay the same) with Python?
                            
                                Matplotlib scale axis lengths to be equal
                            
                                Buffer function for python 3+

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

getting percentage and count Python

Tags:

python

pandas

pandas-groupby

percentage

Hannah Lee

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us