I have a situation where I have a dataframe row to perform calculations with, and I need to use values in following (potentially preceding) rows to do these calculations (essentially a perfect forecast based on the real data set). I get each row from an earlier df.apply
call, so I could pass the whole df along to the downstream objects, but that seems less than ideal based on the complexity of objects in my analysis.
I found one closely related question and answer [1], but the problem is actually fundamentally different in the sense that I do not need the whole df for my calcs, simply the following x
number of rows (which might matter for large dfs).
So, for example:
df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
columns=['PRICE'])
horizon = 3
I need to access values in the following 3 (horizon
) rows in my row-wise df.apply
call. How can I get a naive forecast of the next 3 data points dynamically in my row-wise apply calcs? e.g. for row the first row, where the PRICE
is 100
, I need to use [200, 300, 400]
as a forecast in my calcs.
[1] apply a function to a pandas Dataframe whose returned value is based on other rows
Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
DataFrame - apply() function. The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.
In the Pandas DataFrame we can find the specified row value with the using function iloc(). In this function we pass the row number as parameter.
By getting the row's index inside of the df.apply()
call using row.name
, you can generate the 'forecast' data relative to which row you are currently on. This is effectively a preprocessing step to put the 'forecast' onto the relevant row, or it could be done as part of the initial df.apply()
call if the df is available downstream.
df = pd.DataFrame(
[100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
columns=["PRICE"]
)
horizon = 3
df["FORECAST"] = df.apply(
lambda x: [df["PRICE"][x.name + 1 : x.name + horizon + 1]],
axis=1
)
Results in this:
PRICE FORECAST
0 100 [200, 300, 400]
1 200 [300, 400, 500]
2 300 [400, 500, 600]
3 400 [500, 600, 700]
4 500 [600, 700, 800]
5 600 [700, 800, 900]
6 700 [800, 900, 1000]
7 800 [900, 1000]
8 900 [1000]
9 1000 []
Which can be used in your row-wise df.apply()
calcs.
EDIT: If you want to strip the index from the resulting 'Forecast':
df["FORECAST"] = df.apply(
lambda x: [df["PRICE"][x.name + 1 : x.name + horizon + 1].reset_index(drop=True)],
axis=1
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With