Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to pandas dataframe row using values in other rows

I have a situation where I have a dataframe row to perform calculations with, and I need to use values in following (potentially preceding) rows to do these calculations (essentially a perfect forecast based on the real data set). I get each row from an earlier df.apply call, so I could pass the whole df along to the downstream objects, but that seems less than ideal based on the complexity of objects in my analysis.

I found one closely related question and answer [1], but the problem is actually fundamentally different in the sense that I do not need the whole df for my calcs, simply the following x number of rows (which might matter for large dfs).

So, for example:

df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], 
                  columns=['PRICE'])
horizon = 3

I need to access values in the following 3 (horizon) rows in my row-wise df.apply call. How can I get a naive forecast of the next 3 data points dynamically in my row-wise apply calcs? e.g. for row the first row, where the PRICE is 100, I need to use [200, 300, 400] as a forecast in my calcs.

[1] apply a function to a pandas Dataframe whose returned value is based on other rows

like image 477
lukewitmer Avatar asked May 10 '16 21:05

lukewitmer


People also ask

How will you apply a function to a row of pandas DataFrame?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.

How do I apply a function to an entire data frame?

DataFrame - apply() function. The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).

How do I assign a value to a specific row in pandas?

You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.

How do you call a specific row in pandas?

In the Pandas DataFrame we can find the specified row value with the using function iloc(). In this function we pass the row number as parameter.


1 Answers

By getting the row's index inside of the df.apply() call using row.name, you can generate the 'forecast' data relative to which row you are currently on. This is effectively a preprocessing step to put the 'forecast' onto the relevant row, or it could be done as part of the initial df.apply() call if the df is available downstream.

df = pd.DataFrame(
    [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
    columns=["PRICE"]
)
horizon = 3
    
df["FORECAST"] = df.apply(
    lambda x: [df["PRICE"][x.name + 1 : x.name + horizon + 1]],
    axis=1
)

Results in this:

   PRICE          FORECAST
0    100   [200, 300, 400]
1    200   [300, 400, 500]
2    300   [400, 500, 600]
3    400   [500, 600, 700]
4    500   [600, 700, 800]
5    600   [700, 800, 900]
6    700  [800, 900, 1000]
7    800       [900, 1000]
8    900            [1000]
9   1000                []

Which can be used in your row-wise df.apply() calcs.

EDIT: If you want to strip the index from the resulting 'Forecast':

df["FORECAST"] = df.apply(
    lambda x: [df["PRICE"][x.name + 1 : x.name + horizon + 1].reset_index(drop=True)],
    axis=1
)
like image 125
lukewitmer Avatar answered Sep 20 '22 18:09

lukewitmer