Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pivot_table - Add difference column

I am new to python. I have the following data frame. I am able to pivot in Excel.

I want to add the difference column(in the image, I added it manually).

The difference is B-A value. I am able to replicate except difference column and Grand Total using Python pivot table. Below is my code.

table = pd.pivot_table(data, index=['Category'], values = ['value'], columns=['Name','Date'], fill_value=0)

How can I add the difference column and calculate the value?

How can I get Grand Total at the bottom?

Data as below

df = pd.DataFrame({
"Value": [0.1, 0.2, 3, 1, -.5, 4],
"Date": ["2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01"],
"Name": ['A', 'A', 'A', 'B', 'B', 'B'],
"HI Display1": ["X", "Y", "Z", "Z", "Y", "X"]})

I want to the pivot table as below

Pivot table

like image 370
Chatra Avatar asked Jul 02 '20 14:07

Chatra


2 Answers

Here's a way to do that:

df = pd.DataFrame({
    "Name": ["A", "A", "A", "B", "B", "B"], 
    "Date": "2020-07-01", 
    "Value": [0.1, 0.2, 3, 2, -.5, 4], 
    "Category": ["Z", "Y", "X", "Z", "Y", "X"]
})

piv = pd.pivot_table(df, index="Category", columns="Name", aggfunc=sum)
piv.columns = [c[1] for c in piv.columns]
piv["diff"] = piv.B - piv.A

The output (piv) is:

            A    B  diff
Category                
X         3.0  4.0   1.0
Y         0.2 -0.5  -0.7
Z         0.1  2.0   1.9

To add 'total' for A and B, do

piv.loc["total"] = piv.sum()

Remove the total from the 'diff' column:

piv.loc["total", "diff"] = "" # or np.NaN, if you'd like to be more 
                              # 'pandas' style. 

The output now is:

            A    B  diff
Category                
X         3.0  4.0   1.0
Y         0.2 -0.5  -0.7
Z         0.1  2.0   1.9
total     3.3  5.5   

If, at this point, you'd like to add the title 'Name' on top of the categories, do:

piv.columns = pd.MultiIndex.from_product([["Name"], piv.columns])

piv is now:

         Name          
            A    B diff
Category               
X         3.0  4.0  1.0
Y         0.2 -0.5 -0.7
Z         0.1  2.0  1.9
total     3.3  5.5  

To add the date to each column:

date = df.Date.max()
piv.columns = pd.MultiIndex.from_tuples([c+(date,) for c in piv.columns])

==>
               Name                      
                  A          B       diff
         2020-07-01 2020-07-01 2020-07-01
Category                                 
X               3.0        4.0          1
Y               0.2       -0.5       -0.7
Z               0.1        2.0        1.9
total           3.3        5.5           

Finally, to color a column (e.g. if you're using Jupyter), do:

second_col = piv.columns[2]
piv.style.background_gradient("PiYG", subset = [second_col]).highlight_null('white').set_na_rep("")

enter image description here

like image 58
Roy2012 Avatar answered Sep 20 '22 11:09

Roy2012


Other way to add totals is adding ´margins=True´ argument to pivot function and then replace Total column with difference as this:

data = {
        'Name':['A', 'A' ,'A', 'B', 'B', 'B','A', 'A' ,'A', 'B', 'B', 'B' ],
        'Value':[1, 2, 3, 4, 5, 6,1, 2, 3, 4, 5, 6, ],
        'Category': ['X', 'Y', 'Z','X', 'Y', 'Z','X', 'Y', 'Z','X', 'Y', 'Z']
    }

df = pd.DataFrame(data)

pivot_ = df.pivot_table(index = ["Category"], 
              columns = "Name" , 
              values = "Value", 
              aggfunc = "sum", 
              margins=True, 
              margins_name='Totals')\
 .fillna('')

pivot_['Totals'] = pivot_['B'] - pivot_['A']

pivot_.rename(columns={"Totals": "Diff"})

Output:

Name    A   B   Diff
Category            
X       2   8   6
Y       4   10  6
Z       6   12  6
Totals  12  30  18

EDIT BASED ON QUESTION UPDATE:

Let's use the sample data you now provided:

pivot_1 = df_1.pivot_table(index = ["HI Display1"], 
              columns = ["Name", 'Date'], 
              values = "Value", 
              aggfunc = "sum", 
              margins=True, 
              margins_name='Totals'
).fillna('')

pivot_1['Totals'] = pivot_1['B'].sum(axis=1) - pivot_1['A'].sum(axis=1)

pivot_1.rename(columns={"Totals": "Diff"})

Output:

Name        A           B           Diff
Date        2020-07-01  2020-07-01  
HI Display1         
X           0.1         4.0         3.9
Y           0.2         -0.5        -0.7
Z           3.0         1.0         -2.0
Totals      3.3         4.5         1.2
like image 37
naccode Avatar answered Sep 19 '22 11:09

naccode