Pandas Pivot_Table : Percentage of row calculation for non-numeric values

Question

This is my DATA in dataframe "df":

Document    Name    Time
SPS2315511  A   1 HOUR
SPS2315512  B   1 - 2 HOUR
SPS2315513  C   2 - 3 HOUR
SPS2315514  C   1 HOUR
SPS2315515  B   1 HOUR
SPS2315516  A   2 - 3 HOUR
SPS2315517  A   1 - 2 HOUR

I am using the below code which gives me the summary of count in the pivot table,

table = pivot_table(df, values=["Document"],
                    index=["Name"], columns=["Time"],
                    aggfunc=lambda x: len(x),
                    margins=True, dropna=True)

but what i want is the % of row calculation as in excel pivot when you right click the pivot and select "show value as -> % of Row Total" . Since my Document is a non-numeric value i was not able to get it.

EXPECTED RESULT :

Count of Document   Column Labels

Name    1 HOUR  1 - 2 HOUR  2 - 3 HOUR  Grand Total
A   33.33%  33.33%  33.33%  100.00%
B   50.00%  50.00%  0.00%   100.00%
C   50.00%  0.00%   50.00%  100.00%
Grand Total 42.86%  28.57%  28.57%  100.00%

Can any one please help me figure out a way to get this result??

i am trying to manipulate the pivot data which will give me the row total,not the data from the dataframe and what i wanted is "% of row total". And also most importantly all my data are non-numeric values...

JohnE · Accepted Answer

The possible duplicate noted by @maxymoo is pretty close to a solution, but I'll go ahead and write it up as an answer since there are a couple of differences that are not completely straightforward.

table = pd.pivot_table(df, values=["Document"],
                       index=["Name"], columns=["Time"], 
                       aggfunc=len, margins=True, 
                       dropna=True, fill_value=0)

       Document                      
Time 1 - 2 HOUR 1 HOUR 2 - 3 HOUR All
Name                                 
A             1      1          1   3
B             1      1          0   2
C             0      1          1   2
All           2      3          2   7

The main tweak there is to add fill_value=0 because what you really want there is a count value of zero, not a NaN.

Then you can basically use the solution @maxymoo linked to, but you need to use iloc or similar b/c the table columns are a little complicated now (being a multi-indexed result of the pivot table).

table2 = table.div( table.iloc[:,-1], axis=0 )

       Document                         
Time 1 - 2 HOUR    1 HOUR 2 - 3 HOUR All
Name                                    
A      0.333333  0.333333   0.333333   1
B      0.500000  0.500000   0.000000   1
C      0.000000  0.500000   0.500000   1
All    0.285714  0.428571   0.285714   1

You've still got some minor formatting work to do there (flip first and second columns and convert to %), but those are the numbers you are looking for.

Btw, it's not necessary here, but you might want to think about converting 'Time' to an ordered categorical variable, which would be one way to solve the column ordering problem (I think), but may or may not be worth the bother depending on what else you are doing with the data.

sushmit · Answer

you can use something like this

df = pd.DataFrame({'Document':['SPS2315511','SPS2315512','SPS2315513','SPS2315514','SPS2315515','SPS2315516','SPS2315517'],
                  'Name':['A','B','C','C','B','A','A'],
                  'Time': ['1 HOUR','1 - 2 HOUR','2 - 3 HOUR','1 HOUR','1 HOUR','2 - 3 HOUR','1 - 2 HOUR']})
pd.crosstab(index= df.Name, columns= df.Time, values= df.Document, aggfunc = np.count_nonzero
            ,margins=True,margins_name='Total',normalize= 'index')

Pandas Pivot_Table : Percentage of row calculation for non-numeric values

Tags:

python

pandas

EXPECTED RESULT :

keerthi kumar

2 Answers

JohnE

sushmit

Recent Activity

Donate For Us

Pandas Pivot_Table : Percentage of row calculation for non-numeric values

Tags:

python

pandas

EXPECTED RESULT :

keerthi kumar

2 Answers

JohnE

sushmit

Related questions

Recent Activity

Donate For Us