Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a percentage computation in pandas result

I have the following working code. I need to add a percentage column to monitor changes. I dont know much on how to do it in pandas. I need ideas on what part needs to be modified.

import pandas as pd
dl = []
with open('sampledata.txt') as f:
    for line in f:
        parts = line.split()
        # Cleaning data here.. Conversions to int/float etc,
        if not parts[3][:2].startswith('($'):
            parts.insert(3,'0')
        if len(parts) > 5:
            temp = ' '.join(parts[4:])
            parts = parts[:4] + [temp]
        parts[1] = int(parts[1])
        parts[2] = float(parts[2].replace(',', ''))
        parts[3] = float(parts[3].strip('($)'))
        dl.append(parts)
headers = ['col1', 'col2', 'col3', 'col4', 'col5']
df = pd.DataFrame(dl,columns=headers)
df = df.groupby(['col1','col5']).sum().reset_index()
df = df.sort_values('col2',ascending=False)
df['col4'] =  '($' + df['col4'].astype(str) + ')'
df = df[headers]
print(df)

sampledata.txt #-- Sample Data Source file

alpha   1   54,00.01                    ABC DSW2S
bravo   3   500,000.00                  ACDEF
charlie 1   27,722.29 ($250.45)         DGAS-CAS
delta   2   11 ($10)                    SWSDSASS-CCSSW
echo    5   143,299.00 ($101)           ACS34S1
lima    6   45.00181 ($38.9)            FGF5GGD-DDD
falcon  3   0.1234                      DSS2SFS3
echo    8   145,300 ($125.01)           ACS34S1
charlie 10  252,336,733.383 ($492.06)   DGAS-CAS
romeo   12  980                         ASDS SSSS SDSD
falcon  5   9.19                        DSS2SFS3

Current Output: #-- working result

      col1  col2          col3       col4            col5
4     echo    13  2.885990e+05  ($226.01)         ACS34S1
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD
1    bravo     3  5.000000e+05     ($0.0)           ACDEF
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S

Improved Output: #-- with Additional Column for %

      col1  col2          col3       col4            col5   col6
4     echo    13  2.885990e+05  ($226.01)         ACS34S1   60%     #-- (5 + 8) = 13
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD   0%
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS   900%  #-- (1 + 10) = 11
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3   66.67%  #-- (3 + 5) = 8
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD   0%
1    bravo     3  5.000000e+05     ($0.0)           ACDEF   0%
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW   0%
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S   0%
like image 875
rbutrnz Avatar asked Nov 05 '21 11:11

rbutrnz


People also ask

How do I add percentages in pandas?

We can add percentage symbol to the results from pct_change() using style method and specify the format that we would like to have. Note that the row with nan values also have percentage sign and that does not make sense. We can change the values of nan using “na_rep” to format() function's argument.

How do you assign a percentage in python?

To calculate a percentage in Python, use the division operator (/) to get the quotient from two numbers and then multiply this quotient by 100 using the multiplication operator (*) to get the percentage. This is a simple equation in mathematics to get the percentage.

How do you calculate percentage increase in python?

pct_change() function calculates the percentage change between the current and a prior element. This function by default calculates the percentage change from the immediately previous row. Note : This function is mostly useful in the time-series data.


2 Answers

Setup:

#sampledata.txt
df = pd.DataFrame(data={'col1': ['alpha', 'bravo', 'charlie', 'delta', 'echo','lima', 'falcon', 'echo', 'charlie', 'romeo', 'falcon'],
                        'col2': [1, 3, 1, 2, 5, 6, 3, 8, 10, 12, 5],
                        'col3': ['54,00.01', '500,000.00', '27,722.29 ($250.45)', '11 ($10)', '143,299.00 ($101)', '45.00181 ($38.9)', '0.1234', '145,300 ($125.01)', '252,336,733.383 ($492.06)', '980', '9.19'],
                        'col4': ['ABC DSW2S', 'ACDEF', 'DGAS-CAS', 'SWSDSASS-CCSSW', 'ACS34S1', 'FGF5GGD-DDD', 'DSS2SFS3', 'ACS34S1', 'DGAS-CAS', 'ASDS SSSS SDSD', 'DSS2SFS3']})

Code:

df['within_brackets'] = df['col3'].str.extract('.*\((.*)\).*') #Extract whats inside the brackets.
df['within_brackets'].replace('\$', '', regex=True, inplace=True)
df['col3'] = df['col3'].str.replace(r"(\s*\(.*\))|,", "", regex=True) #Extract whats outside the brackets
df.rename(columns={'col4': 'col5', 'within_brackets': 'col4'}, inplace=True)
df[['col3', 'col4']] = df[['col3', 'col4']].astype(float)

df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                      col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                      col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                      col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
df['col6'].fillna(0, inplace=True)
#print df here and you will get to know what output looks like till now.
df['col6'].fillna(0, inplace=True)
df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
df.sort_values(by=['col2'], ascending=False, inplace=True)
print(df)

Output:

      col1  col2          col3    col4            col5    col6
4     echo    13  2.885990e+05  226.01         ACS34S1   60.0%
7    romeo    12  9.800000e+02    0.00  ASDS SSSS SDSD      0%
2  charlie    11  2.523645e+08  742.51        DGAS-CAS  900.0%
5   falcon     8  9.313400e+00    0.00        DSS2SFS3  66.67%
6     lima     6  4.500181e+01   38.90     FGF5GGD-DDD      0%
1    bravo     3  5.000000e+05    0.00           ACDEF      0%
3    delta     2  1.100000e+01   10.00  SWSDSASS-CCSSW      0%
0    alpha     1  5.400010e+03    0.00       ABC DSW2S      0%

Update Use this with your existing code: (Use the following code after df = pd.DataFrame(dl,columns=headers) this line in your code.)

df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                      col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                      col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                      col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
df['col6'].fillna(0, inplace=True)
#print df here and you will get to know what output looks like till now.
df['col6'].fillna(0, inplace=True)
df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
df['col4'] =  '($' + df['col4'].astype(str) + ')'
df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
like image 140
Muhammad Hassan Avatar answered Oct 18 '22 02:10

Muhammad Hassan


You can add the following lines just after your code: The function compute_percentage() is using the list variable dl.

def compute_percentage(row):
    vl = [float(parts[1]) for parts in dl if parts[0] == row['col1']]
    i = round(100. * (vl[-1]-vl[0])/vl[0] if vl[0] != 0 else 0, 2)
    if float(int(i)) == i:
        i = int(i)
    return str(i) + '%'

df['col6'] = df.apply(compute_percentage, axis=1)

Output:

      col1  col2          col3       col4            col5    col6
4     echo    13  2.885990e+05  ($226.01)         ACS34S1     60%
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD      0%
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS    900%
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3  66.67%
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD      0%
1    bravo     3  5.000000e+05     ($0.0)           ACDEF      0%
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW      0%
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S      0%
like image 44
Eric Marchand Avatar answered Oct 18 '22 01:10

Eric Marchand