Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning two values from pandas.rolling_apply

Tags:

python

pandas

I am using pandas.rolling_apply to fit data to a distribution and get a value from it, but I need it also report a rolling goodness of fit (specifically, p-value). Currently I'm doing it like this:

def func(sample):
    fit = genextreme.fit(sample)
    return genextreme.isf(0.9, *fit)

def p_value(sample):
    fit = genextreme.fit(sample)
    return kstest(sample, 'genextreme', fit)[1]

values = pd.rolling_apply(data, 30, func)
p_values = pd.rolling_apply(data, 30, p_value)
results = pd.DataFrame({'values': values, 'p_value': p_values})

The problem is that I have a lot of data, and the fit function is expensive, so I don't want to call it twice for every sample. What I'd rather do is something like this:

def func(sample):
    fit = genextreme.fit(sample)
    value = genextreme.isf(0.9, *fit)
    p_value = kstest(sample, 'genextreme', fit)[1]
    return {'value': value, 'p_value': p_value}

results = pd.rolling_apply(data, 30, func)

Where results is a DataFrame with two columns. If I try to run this, I get an exception: TypeError: a float is required. Is it possible to achieve this, and if so, how?

like image 274
aquavitae Avatar asked Mar 06 '14 07:03

aquavitae


People also ask

How can I return multiple values from pandas?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

Can a Pandas series have multiple columns?

In Pandas, we have the freedom to add columns in the data frame whenever needed. There are multiple ways to add columns to pandas dataframe.

Is between in pandas inclusive?

Pandas between() method is used on series to check which values lie between first and second argument. inclusive: A Boolean value which is True by default. If False, it excludes the two passed arguments while checking.


2 Answers

I had a similar problem and solved it by using a member function of a separate helper class during apply. That member function does as required return a single value but I store the other calc results as members of the class and can use it afterwards.

Simple Example:

class CountCalls:
    def __init__(self):
        self.counter = 0

    def your_function(self, window):
        retval = f(window)
        self.counter = self.counter + 1


TestCounter = CountCalls()

pandas.Series.rolling(your_seriesOrDataframeColumn, window = your_window_size).apply(TestCounter.your_function)

print TestCounter.counter

Assume your function f would return a tuple of two values v1,v2. Then you can return v1 and assign it to column_v1 to your dataframe. The second value v2 you simply accumulate in a Series series_val2 within the helper class. Afterwards you just assing that series as new column to your dataframe. JML

like image 168
JML64 Avatar answered Sep 30 '22 07:09

JML64


I had a similar problem before. Here's my solution for it:

from collections import deque
class your_multi_output_function_class:
    def __init__(self):
        self.deque_2 = deque()
        self.deque_3 = deque()

    def f1(self, window):
        self.k = somefunction(y)
        self.deque_2.append(self.k[1])
        self.deque_3.append(self.k[2])
        return self.k[0]    

    def f2(self, window):
        return self.deque_2.popleft()   
    def f3(self, window):
        return self.deque_3.popleft() 

func = your_multi_output_function_class()

output = your_pandas_object.rolling(window=10).agg(
    {'a':func.f1,'b':func.f2,'c':func.f3}
    )
like image 39
Yi Yu Avatar answered Sep 30 '22 06:09

Yi Yu