Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas dataframe subtract cumulative column

I have some data that I am importing into a Pandas dataframe. This data is "cumulative" and indexed on a time series, see below:

                        Raw data
2016-11-23 10:00:00     48.6 
2016-11-23 11:00:00     158.7 
2016-11-23 12:00:00     377.8 
2016-11-23 13:00:00     591.7 
2016-11-23 14:00:00     748.5 
2016-11-23 15:00:00     848.2 

The data is updated daily, so the time series will move forward a day each day.

What I need to do is to take this dataframe and create a new column as shown below. The first row simply copies the data from the "Raw data" column. Then each subsequent row takes the data from the "Raw data" column, and subtracts the value that appeared before it, e.g. 158.7 - 48.6 = 110.1, 377.8 - 158.7 = 219.1, etc.

Does anyone know how I can achieve what is in the "Process data" column in Python/Pandas?

                    Raw data    Processed data
23/11/2016 10:00    48.6        48.6
23/11/2016 11:00    158.7       110.1
23/11/2016 12:00    377.8       219.1
23/11/2016 13:00    591.7       213.9
23/11/2016 14:00    748.5       156.8
23/11/2016 15:00    848.2       99.7
like image 498
pottolom Avatar asked Nov 24 '16 12:11

pottolom


People also ask

How do I subtract multiple columns in pandas?

We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.

How do you subtract a value from a column in a DataFrame?

The sub() method subtracts each value in the DataFrame with a specified value. The specified value must be an object that can be subtracted from the values in the DataFrame.

How do you subtract series in pandas?

subtract() function basically perform subtraction of series and other, element-wise (binary operator sub). It is equivalent to series - other , but with support to substitute a fill_value for missing data in one of the inputs.


1 Answers

You can use substract by sub with shifted column:

Last fill NaN by first value in Raw data.

df['Processed data'] = df['Raw data'].sub(df['Raw data'].shift())
df['Processed data'].iloc[0] = df['Raw data'].iloc[0]
print (df)
                     Raw data  Processed data
2016-11-23 10:00:00      48.6            48.6
2016-11-23 11:00:00     158.7           110.1
2016-11-23 12:00:00     377.8           219.1
2016-11-23 13:00:00     591.7           213.9
2016-11-23 14:00:00     748.5           156.8
2016-11-23 15:00:00     848.2            99.7
like image 184
jezrael Avatar answered Sep 25 '22 20:09

jezrael