How to add a percentage computation in pandas result

Tags:

I have the following working code. I need to add a percentage column to monitor changes. I dont know much on how to do it in pandas. I need ideas on what part needs to be modified.

import pandas as pd
dl = []
with open('sampledata.txt') as f:
    for line in f:
        parts = line.split()
        # Cleaning data here.. Conversions to int/float etc,
        if not parts[3][:2].startswith('($'):
            parts.insert(3,'0')
        if len(parts) > 5:
            temp = ' '.join(parts[4:])
            parts = parts[:4] + [temp]
        parts[1] = int(parts[1])
        parts[2] = float(parts[2].replace(',', ''))
        parts[3] = float(parts[3].strip('($)'))
        dl.append(parts)
headers = ['col1', 'col2', 'col3', 'col4', 'col5']
df = pd.DataFrame(dl,columns=headers)
df = df.groupby(['col1','col5']).sum().reset_index()
df = df.sort_values('col2',ascending=False)
df['col4'] =  '($' + df['col4'].astype(str) + ')'
df = df[headers]
print(df)

sampledata.txt #-- Sample Data Source file

alpha   1   54,00.01                    ABC DSW2S
bravo   3   500,000.00                  ACDEF
charlie 1   27,722.29 ($250.45)         DGAS-CAS
delta   2   11 ($10)                    SWSDSASS-CCSSW
echo    5   143,299.00 ($101)           ACS34S1
lima    6   45.00181 ($38.9)            FGF5GGD-DDD
falcon  3   0.1234                      DSS2SFS3
echo    8   145,300 ($125.01)           ACS34S1
charlie 10  252,336,733.383 ($492.06)   DGAS-CAS
romeo   12  980                         ASDS SSSS SDSD
falcon  5   9.19                        DSS2SFS3

Current Output: #-- working result

      col1  col2          col3       col4            col5
4     echo    13  2.885990e+05  ($226.01)         ACS34S1
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD
1    bravo     3  5.000000e+05     ($0.0)           ACDEF
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S

Improved Output: #-- with Additional Column for %

      col1  col2          col3       col4            col5   col6
4     echo    13  2.885990e+05  ($226.01)         ACS34S1   60%     #-- (5 + 8) = 13
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD   0%
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS   900%  #-- (1 + 10) = 11
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3   66.67%  #-- (3 + 5) = 8
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD   0%
1    bravo     3  5.000000e+05     ($0.0)           ACDEF   0%
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW   0%
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S   0%

875

asked Nov 05 '21 11:11

rbutrnz

2 Answers

Setup:

#sampledata.txt
df = pd.DataFrame(data={'col1': ['alpha', 'bravo', 'charlie', 'delta', 'echo','lima', 'falcon', 'echo', 'charlie', 'romeo', 'falcon'],
                        'col2': [1, 3, 1, 2, 5, 6, 3, 8, 10, 12, 5],
                        'col3': ['54,00.01', '500,000.00', '27,722.29 ($250.45)', '11 ($10)', '143,299.00 ($101)', '45.00181 ($38.9)', '0.1234', '145,300 ($125.01)', '252,336,733.383 ($492.06)', '980', '9.19'],
                        'col4': ['ABC DSW2S', 'ACDEF', 'DGAS-CAS', 'SWSDSASS-CCSSW', 'ACS34S1', 'FGF5GGD-DDD', 'DSS2SFS3', 'ACS34S1', 'DGAS-CAS', 'ASDS SSSS SDSD', 'DSS2SFS3']})

Code:

df['within_brackets'] = df['col3'].str.extract('.*\((.*)\).*') #Extract whats inside the brackets.
df['within_brackets'].replace('\$', '', regex=True, inplace=True)
df['col3'] = df['col3'].str.replace(r"(\s*\(.*\))|,", "", regex=True) #Extract whats outside the brackets
df.rename(columns={'col4': 'col5', 'within_brackets': 'col4'}, inplace=True)
df[['col3', 'col4']] = df[['col3', 'col4']].astype(float)

df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                      col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                      col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                      col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
df['col6'].fillna(0, inplace=True)
#print df here and you will get to know what output looks like till now.
df['col6'].fillna(0, inplace=True)
df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
df.sort_values(by=['col2'], ascending=False, inplace=True)
print(df)

Output:

      col1  col2          col3    col4            col5    col6
4     echo    13  2.885990e+05  226.01         ACS34S1   60.0%
7    romeo    12  9.800000e+02    0.00  ASDS SSSS SDSD      0%
2  charlie    11  2.523645e+08  742.51        DGAS-CAS  900.0%
5   falcon     8  9.313400e+00    0.00        DSS2SFS3  66.67%
6     lima     6  4.500181e+01   38.90     FGF5GGD-DDD      0%
1    bravo     3  5.000000e+05    0.00           ACDEF      0%
3    delta     2  1.100000e+01   10.00  SWSDSASS-CCSSW      0%
0    alpha     1  5.400010e+03    0.00       ABC DSW2S      0%

Update Use this with your existing code: (Use the following code after df = pd.DataFrame(dl,columns=headers) this line in your code.)

df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                      col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                      col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                      col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
df['col6'].fillna(0, inplace=True)
#print df here and you will get to know what output looks like till now.
df['col6'].fillna(0, inplace=True)
df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
df['col4'] =  '($' + df['col4'].astype(str) + ')'
df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]

140

answered Oct 18 '22 02:10

Muhammad Hassan

You can add the following lines just after your code: The function compute_percentage() is using the list variable dl.

def compute_percentage(row):
    vl = [float(parts[1]) for parts in dl if parts[0] == row['col1']]
    i = round(100. * (vl[-1]-vl[0])/vl[0] if vl[0] != 0 else 0, 2)
    if float(int(i)) == i:
        i = int(i)
    return str(i) + '%'

df['col6'] = df.apply(compute_percentage, axis=1)

Output:

      col1  col2          col3       col4            col5    col6
4     echo    13  2.885990e+05  ($226.01)         ACS34S1     60%
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD      0%
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS    900%
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3  66.67%
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD      0%
1    bravo     3  5.000000e+05     ($0.0)           ACDEF      0%
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW      0%
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S      0%

answered Oct 18 '22 01:10

Eric Marchand

Related questions
                            
                                BS4 replace_with result is no longer in tree
                            
                                Matplotlib, vertical space between legend symbols
                            
                                Apply function to each row in Pandas dataframe by group
                            
                                How to run a Julia file, which uses a package, in Python?
                            
                                How to Use Class Weights with Focal Loss in PyTorch for Imbalanced dataset for MultiClass Classification
                            
                                Alembic ignore specific tables
                            
                                Train and predict on variable length sequences
                            
                                How to make setup.py for standalone python application in a right way?
                            
                                Why does this solution work in Javascript but not in Python? (Dynamic programming)
                            
                                Hung cells: running multiple jupyter notebooks in parallel with papermill
                            
                                Cannot install Google Colab locally
                            
                                "Defaulting to user installation because normal site-packages is not writeable" python message
                            
                                Create Pandas DataFrame from space separated String
                            
                                Is it possible to get Dash-Cytoscape graph tooltips?
                            
                                How to use RFE with xgboost Booster?
                            
                                Django Channels Redis: Exception inside application: Lock is not acquired
                            
                                Filter rows of python datatable based on whether it is in a list
                            
                                What is the right way to check if a type hint is annotated?
                            
                                Calculating angles of body skeleton in video using OpenPose
                            
                                OpenCV VideoCapture returns strange frame offset for different versions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to add a percentage computation in pandas result

Tags:

python

python-3.x

pandas

dataframe

rbutrnz

People also ask

2 Answers

Muhammad Hassan

Eric Marchand

Recent Activity

Donate For Us