I am using a pandas/python dataframe. I am trying to do a lag subtraction.
I am currently using:
newCol = df.col - df.col.shift()
This leads to a NaN in the first spot:
NaN
45
63
23
...
First question: Is this the best way to do a subtraction like this?
Second: If I want to add a column (same number of rows) to this new column. Is there a way that I can make all the NaN's 0's for the calculation?
Ex:
col_1 =
Nan
45
63
23
col_2 =
10
10
10
10
new_col =
10
55
73
33
and NOT
NaN
55
73
33
Thank you.
We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.
Calculate Sum of Given Columns To sum given or list of columns then create a list with all columns you wanted and slice the DataFrame with the selected list of columns and use the sum() function. Use df['Sum']=df[col_list]. sum(axis=1) to get the total sum.
Pandas DataFrame eq() Method The eq() method compares each value in a DataFrame to check if it is equal to a specified value, or a value from a specified DataFrame objects, and returns a DataFrame with boolean True/False for each comparison.
I think your method of of computing lags is just fine:
import pandas as pd
df = pd.DataFrame(range(4), columns = ['col'])
print(df['col'] - df['col'].shift())
# 0 NaN
# 1 1
# 2 1
# 3 1
# Name: col
print(df['col'] + df['col'].shift())
# 0 NaN
# 1 1
# 2 3
# 3 5
# Name: col
If you wish NaN
plus (or minus) a number to be the number (not NaN
), use the add
(or sub
) method with fill_value = 0
:
print(df['col'].sub(df['col'].shift(), fill_value = 0))
# 0 0
# 1 1
# 2 1
# 3 1
# Name: col
print(df['col'].add(df['col'].shift(), fill_value = 0))
# 0 0
# 1 1
# 2 3
# 3 5
# Name: col
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With