Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use pandas.DataFrame.rolling with a step greater than 1?

In R you can compute a rolling mean with a specified window that can shift by a specified amount each time.

However maybe I just haven't found it anywhere but it doesn't seem like you can do it in pandas or some other Python library?

Does anyone know of a way around this? I'll give you an example of what I mean:

example

Here we have bi-weekly data, and I am computing the two month moving average that shifts by 1 month which is 2 rows.

So in R I would do something like: two_month__movavg=rollapply(mydata,4,mean,by = 2,na.pad = FALSE) Is there no equivalent in Python?

EDIT1:

DATE  A DEMAND   ...     AA DEMAND  A Price
    0  2006/01/01 00:30:00  8013.27833   ...     5657.67500    20.03
    1  2006/01/01 01:00:00  7726.89167   ...     5460.39500    18.66
    2  2006/01/01 01:30:00  7372.85833   ...     5766.02500    20.38
    3  2006/01/01 02:00:00  7071.83333   ...     5503.25167    18.59
    4  2006/01/01 02:30:00  6865.44000   ...     5214.01500    17.53
like image 513
user8261831 Avatar asked Jan 22 '19 03:01

user8261831


People also ask

How does rolling work in Pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

What is the fastest way to iterate over Pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

How do you use greater than or equal to in Pandas?

Pandas DataFrame: ge() functionThe ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison. Any single or multiple element data structure, or list-like object.


1 Answers

So, I know it is a long time since the question was asked, by I bumped into this same problem and when dealing with long time series you really would want to avoid the unnecessary calculation of the values you are not interested at. Since Pandas rolling method does not implement a step argument, I wrote a workaround using numpy.

It is basically a combination of the solution in this link and the indexing proposed by BENY.

def apply_rolling_data(data, col, function, window, step=1, labels=None):
    """Perform a rolling window analysis at the column `col` from `data`

    Given a dataframe `data` with time series, call `function` at
    sections of length `window` at the data of column `col`. Append
    the results to `data` at a new columns with name `label`.

    Parameters
    ----------
    data : DataFrame
        Data to be analyzed, the dataframe must stores time series
        columnwise, i.e., each column represent a time series and each
        row a time index
    col : str
        Name of the column from `data` to be analyzed
    function : callable
        Function to be called to calculate the rolling window
        analysis, the function must receive as input an array or
        pandas series. Its output must be either a number or a pandas
        series
    window : int
        length of the window to perform the analysis
    step : int
        step to take between two consecutive windows
    labels : str
        Name of the column for the output, if None it defaults to
        'MEASURE'. It is only used if `function` outputs a number, if
        it outputs a Series then each index of the series is going to
        be used as the names of their respective columns in the output

    Returns
    -------
    data : DataFrame
        Input dataframe with added columns with the result of the
        analysis performed

    """

    x = _strided_app(data[col].to_numpy(), window, step)
    rolled = np.apply_along_axis(function, 1, x)

    if labels is None:
        labels = [f"metric_{i}" for i in range(rolled.shape[1])]

    for col in labels:
        data[col] = np.nan

    data.loc[
        data.index[
            [False]*(window-1)
            + list(np.arange(len(data) - (window-1)) % step == 0)],
        labels] = rolled

    return data


def _strided_app(a, L, S):  # Window len = L, Stride len/stepsize = S
    """returns an array that is strided
    """
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(
        a, shape=(nrows, L), strides=(S*n, n))
like image 133
pgaluzio Avatar answered Sep 21 '22 18:09

pgaluzio