Assume I have a DataFrame of the following form where the first column is a random number, and the other columns will be based on the value in the previous column.
For ease of use, let's say I want each number to be the previous one squared. So it would look like the below.
I know I can write a pretty simple loop to do this, but I also know looping is not usually the most efficient in python/pandas. How could this be done with apply()
or rolling_apply()
? Or, otherwise be done more efficiently?
My (failed) attempts below:
In [12]: a = pandas.DataFrame({0:[1,2,3,4,5],1:0,2:0,3:0})
In [13]: a
Out[13]:
0 1 2 3
0 1 0 0 0
1 2 0 0 0
2 3 0 0 0
3 4 0 0 0
4 5 0 0 0
In [14]: a = a.apply(lambda x: x**2)
In [15]: a
Out[15]:
0 1 2 3
0 1 0 0 0
1 4 0 0 0
2 9 0 0 0
3 16 0 0 0
4 25 0 0 0
In [16]: a = pandas.DataFrame({0:[1,2,3,4,5],1:0,2:0,3:0})
In [17]: pandas.rolling_apply(a,1,lambda x: x**2)
C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\spyderlib\widgets\externalshell\start_ipython_kernel.py:1: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(center=False,window=1).apply(args=<tuple>,kwargs=<dict>,func=<function>)
# -*- coding: utf-8 -*-
Out[17]:
0 1 2 3
0 1.0 0.0 0.0 0.0
1 4.0 0.0 0.0 0.0
2 9.0 0.0 0.0 0.0
3 16.0 0.0 0.0 0.0
4 25.0 0.0 0.0 0.0
In [18]: a = pandas.DataFrame({0:[1,2,3,4,5],1:0,2:0,3:0})
In [19]: a = a[:-1]**2
In [20]: a
Out[20]:
0 1 2 3
0 1 0 0 0
1 4 0 0 0
2 9 0 0 0
3 16 0 0 0
In [21]:
So, my issue is mostly how to refer to the previous column value in my DataFrame calculations.
What you're describing is a recurrence relation, and I don't think there is currently any non-loop way to do that. Things like apply
and rolling_apply
still rely on having all the needed data available before they begin, and outputting all the result data at once at the end. That is, they don't allow you to compute the next value using earlier values of the same series. See this question and this one as well as this pandas issue.
In practical terms, for your example, you only have three columns you want to fill in, so doing a three-pass loop (as shown in some of the other answers) will probably not be a major performance hit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With