Pandas DataFrames: Create new rows with calculations across existing rows

Tags:

How can I create new rows from an existing DataFrame by grouping by certain fields (in the example "Country" and "Industry") and applying some math to another field (in the example "Field" and "Value")?

Source DataFrame

df = pd.DataFrame({'Country': ['USA','USA','USA','USA','USA','USA','Canada','Canada'],
                   'Industry': ['Finance', 'Finance', 'Retail', 
                                'Retail', 'Energy', 'Energy', 
                                'Retail', 'Retail'],
                   'Field': ['Import', 'Export','Import', 
                             'Export','Import', 'Export',
                             'Import', 'Export'],
                   'Value': [100, 50, 80, 10, 20, 5, 30, 10]})

    Country Industry    Field   Value
0   USA     Finance     Import  100
1   USA     Finance     Export  50
2   USA     Retail      Import  80
3   USA     Retail      Export  10
4   USA     Energy      Import  20
5   USA     Energy      Export  5
6   Canada  Retail      Import  30
7   Canada  Retail      Export  10

Target DataFrame

Net = Import - Export

    Country Industry    Field   Value
0   USA     Finance     Net     50
1   USA     Retail      Net     70
2   USA     Energy      Net     15
3   Canada  Retail      Net     20

231

asked Apr 13 '19 21:04

Lorenz

2 Answers

There are quite possibly many ways. Here's one using groupby and unstack:

(df.groupby(['Country', 'Industry', 'Field'], sort=False)['Value']
   .sum()
   .unstack('Field')
   .eval('Import - Export')
   .reset_index(name='Value'))

  Country Industry  Value
0     USA  Finance     50
1     USA   Retail     70
2     USA   Energy     15
3  Canada   Retail     20

115

answered Oct 06 '22 01:10

cs95

IIUC

df=df.set_index(['Country','Industry'])

Newdf=(df.loc[df.Field=='Export','Value']-df.loc[df.Field=='Import','Value']).reset_index().assign(Field='Net')
Newdf
  Country Industry  Value Field
0     USA  Finance    -50   Net
1     USA   Retail    -70   Net
2     USA   Energy    -15   Net
3  Canada   Retail    -20   Net

pivot_table

df.pivot_table(index=['Country','Industry'],columns='Field',values='Value',aggfunc='sum').\
  diff(axis=1).\
     dropna(1).\
        rename(columns={'Import':'Value'}).\
          reset_index()
Out[112]: 
Field Country Industry  Value
0      Canada   Retail   20.0
1         USA   Energy   15.0
2         USA  Finance   50.0
3         USA   Retail   70.0

answered Oct 05 '22 23:10

BENY

Related questions
                            
                                Can I change the nullability of a column in my Spark dataframe?
                            
                                Face comparison (Not recognition or detection) using OpenCV and Keras?
                            
                                MemoryError in keras.utils.np_utils.to_categorical
                            
                                pip stopped working after upgrading anaconda v4.4 to v5.0
                            
                                How to select all non-NaN columns and non-NaN last column using pandas?
                            
                                Access last element of this python panda Series
                            
                                Convert String to List of Dictionaries Python 3
                            
                                Keras load_model returning Unexpected keyword argument passed to optimizer: amsgrad
                            
                                How to use `cv2.findContours` in different OpenCV versions?
                            
                                Is there an efficient way to create a random bit mask in Pytorch?
                            
                                Suppress key addition in collections.defaultdict
                            
                                Flask-socketio - failed to set "Access-Control-Allow-Origin" response header
                            
                                row_to_json and psycopg2.fetchall() results are lists within a list instead of dictionaries in a list
                            
                                Difference or Relation between RASA and Spacy
                            
                                Creating custom dictionary from two lists
                            
                                Sort column in Pandas DataFrame by specific order
                            
                                Could not install pycocotools in windows: fatal error C1083: Cannot open include file: 'io.h': No such file or directory error:
                            
                                Count instances of strings in multiple columns python
                            
                                DynamodDB: How to update sort key?
                            
                                How to perform SMOTE with cross validation in sklearn in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas DataFrames: Create new rows with calculations across existing rows

Tags:

python

pandas

dataframe

Lorenz

People also ask

2 Answers

cs95

BENY

Recent Activity

Donate For Us