How to get log rate of change between rows in Pandas DataFrame effectively?

Question

Let's say I have some DataFrame (with about 10000 rows in my case, this is just a minimal example)

>>> import pandas as pd

>>> sample_df = pd.DataFrame(
        {'col1': list(range(1, 10)), 'col2': list(range(10, 19))})

>>> sample_df

   col1  col2
0     1    10
1     2    11
2     3    12
3     4    13
4     5    14
5     6    15
6     7    16
7     8    17
8     9    18

For my purposes, I need to calculate the series represented by ln(col_i(n+1) / col_i(n)) for each col_i in my DataFrame, where n represents a row number. How can I calculate this?

Background knowledge

I know that I can get the difference between each column in a very simple way using

>>> sample_df.diff()

   col1  col2
0   NaN   NaN
1     1     1
2     1     1
3     1     1
4     1     1
5     1     1
6     1     1
7     1     1
8     1     1

Or the percentage change, which is (col_i(n+1) - col_i(n))/col_i(n+1), using

>>> sample_df.pct_change()

       col1      col2
0       NaN       NaN
1  1.000000  0.100000
2  0.500000  0.090909
3  0.333333  0.083333
4  0.250000  0.076923
5  0.200000  0.071429
6  0.166667  0.066667
7  0.142857  0.062500
8  0.125000  0.058824

I have just been struggling with a straightforward way to get the direct division of each consecutive column by the previous. Were I to know how to do that even, I could just apply the natural logarithm to every element in the series after the fact.

Currently to solve my problem, I'm resorting to creating another column shifted with row elements down by 1 for each column and then applying the formula between the two columns. It seems messy and sub-optimal to me, though.

Any help would be greatly appreciated!

piRSquared · Accepted Answer

IIUC:

log of a ratio is the difference of logs:

sample_df.apply(np.log).diff()

Or better still:

np.log(sample_df).diff()

enter image description here

Timing

enter image description here

acushner · Answer

just use np.log:

np.log(df.col1 / df.col1.shift())

you can also use apply as suggested by @nikita but that will be slower.

in addition, if you wanted to do it for the entire dataframe, you could just do:

np.log(df / df.shift())

How to get log rate of change between rows in Pandas DataFrame effectively?

Tags:

python

pandas

dataframe

numpy

series

Background knowledge

Eric Hansen

2 Answers

Timing

piRSquared

acushner

Recent Activity

Donate For Us

How to get log rate of change between rows in Pandas DataFrame effectively?

Tags:

python

pandas

dataframe

numpy

series

Background knowledge

Eric Hansen

2 Answers

Timing

piRSquared

acushner

Related questions

Recent Activity

Donate For Us