Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pearsonr: TypeError: No loop matching the specified signature and casting was found for ufunc add

I have a timeseries Pandas dataframe called "df". It has one column and the following shape: (2000, 1). The head of the dataframe, below, shows its structure:

            Weight
Date    
2004-06-01  1.9219
2004-06-02  1.8438
2004-06-03  1.8672
2004-06-04  1.7422
2004-06-07  1.8203

Goal

I am trying to use a "for-loop" to calculate the correlation between the percentage change of the "Weight" variable over various timeframes or timelags. This is being done to evaluate the impact of holding livestock over time periods of various lengths.

The loop can be found below:

from scipy.stats.stats import pearsonr

# Loop for producing combinations of different timelags and holddays 
# and calculating the pearsonr correlation and p-value of each combination 

for timelags in [1, 5, 10, 25, 60, 120, 250]:
    for holddays in [1, 5, 10, 25, 60, 120, 250]:
        weight_change_lagged = df.pct_change(periods=timelags)
        weight_change_future = df.shift(-holddays).pct_change(periods=holddays)

        if (timelags >= holddays):
            indepSet=range(0, weight_change_lagged.shape[0], holddays)
        else:
            indepSet=range(0, weight_change_lagged.shape[0], timelags)

        weight_change_lagged = weight_change_lagged.iloc[indepSet]
        weight_change_future = weight_change_future.iloc[indepSet]

        not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values

        (correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])
        print('%4i %4i %7.4f %7.4f' % (timelags, holddays, correlation, p-value))

The loop executes well, however, it fails when it comes to calculating the pearsonr correlation and p-value, i.e at this section:

(correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])

It generates this error:

TypeError: no loop matching the specified signature and casting was found for ufunc add

Does anyone have any clues on how to fix my problem? I looked through the forums and found no answers that fit my exact requirements.

like image 766
john_mon Avatar asked May 20 '20 12:05

john_mon


1 Answers

Through random tinkering, I managed to solve my problem as follows:

scipy's pearsonr package only accepts arrays or array-like inputs. This means that:

  • Numpy arrays of input variables work.
  • Pandas Series of the input variables work.

However, complete Pandas Dataframes of the variables, even if they contain one column, do not work.

So, I edited the problematic segment of the code as follows:

# Define an object containing observations that are not NA
not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values

# Remove na values before inputting the data into the peasonr function (not within the function as I had done):
weight_change_lagged = weight_change_lagged[not_na]
weight_change_future = weight_change_future[not_na]

# Input Pandas Series of the Future and Lagged Variables into the function
(correlation, p-value)=pearsonr(weight_change_lagged['Weight'], weight_change_future['Weight'])

With just that slight modification, the code executes without hitches.

Note:

If you use double square brackets, as follows, you are inputting a pandas dataframe not a series, and the pearsonr function will throw an error:

weight_change_future[['Weight']]

Thanks to everyone who tried to help, you questions led me to the answer.

like image 74
john_mon Avatar answered Nov 15 '22 03:11

john_mon