Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrames: Create new rows with calculations across existing rows

How can I create new rows from an existing DataFrame by grouping by certain fields (in the example "Country" and "Industry") and applying some math to another field (in the example "Field" and "Value")?

Source DataFrame

df = pd.DataFrame({'Country': ['USA','USA','USA','USA','USA','USA','Canada','Canada'],
                   'Industry': ['Finance', 'Finance', 'Retail', 
                                'Retail', 'Energy', 'Energy', 
                                'Retail', 'Retail'],
                   'Field': ['Import', 'Export','Import', 
                             'Export','Import', 'Export',
                             'Import', 'Export'],
                   'Value': [100, 50, 80, 10, 20, 5, 30, 10]})

    Country Industry    Field   Value
0   USA     Finance     Import  100
1   USA     Finance     Export  50
2   USA     Retail      Import  80
3   USA     Retail      Export  10
4   USA     Energy      Import  20
5   USA     Energy      Export  5
6   Canada  Retail      Import  30
7   Canada  Retail      Export  10

Target DataFrame

Net = Import - Export

    Country Industry    Field   Value
0   USA     Finance     Net     50
1   USA     Retail      Net     70
2   USA     Energy      Net     15
3   Canada  Retail      Net     20
like image 231
Lorenz Avatar asked Apr 13 '19 21:04

Lorenz


People also ask

How do you mix rows in pandas?

One of the easiest ways to shuffle a Pandas Dataframe is to use the Pandas sample method. The df. sample method allows you to sample a number of rows in a Pandas Dataframe in a random order. Because of this, we can simply specify that we want to return the entire Pandas Dataframe, in a random order.

How do I create a new data frame from an existing column?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.

How do I add rows to pandas DataFrame from another DataFrame?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. ignore_index : If True, do not use the index labels.


2 Answers

There are quite possibly many ways. Here's one using groupby and unstack:

(df.groupby(['Country', 'Industry', 'Field'], sort=False)['Value']
   .sum()
   .unstack('Field')
   .eval('Import - Export')
   .reset_index(name='Value'))

  Country Industry  Value
0     USA  Finance     50
1     USA   Retail     70
2     USA   Energy     15
3  Canada   Retail     20
like image 115
cs95 Avatar answered Oct 06 '22 01:10

cs95


IIUC

df=df.set_index(['Country','Industry'])

Newdf=(df.loc[df.Field=='Export','Value']-df.loc[df.Field=='Import','Value']).reset_index().assign(Field='Net')
Newdf
  Country Industry  Value Field
0     USA  Finance    -50   Net
1     USA   Retail    -70   Net
2     USA   Energy    -15   Net
3  Canada   Retail    -20   Net

pivot_table

df.pivot_table(index=['Country','Industry'],columns='Field',values='Value',aggfunc='sum').\
  diff(axis=1).\
     dropna(1).\
        rename(columns={'Import':'Value'}).\
          reset_index()
Out[112]: 
Field Country Industry  Value
0      Canada   Retail   20.0
1         USA   Energy   15.0
2         USA  Finance   50.0
3         USA   Retail   70.0
like image 45
BENY Avatar answered Oct 05 '22 23:10

BENY