I have a timeseries Pandas dataframe called "df". It has one column and the following shape: (2000, 1). The head of the dataframe, below, shows its structure:
Weight
Date
2004-06-01 1.9219
2004-06-02 1.8438
2004-06-03 1.8672
2004-06-04 1.7422
2004-06-07 1.8203
Goal
I am trying to use a "for-loop" to calculate the correlation between the percentage change of the "Weight" variable over various timeframes or timelags. This is being done to evaluate the impact of holding livestock over time periods of various lengths.
The loop can be found below:
from scipy.stats.stats import pearsonr
# Loop for producing combinations of different timelags and holddays
# and calculating the pearsonr correlation and p-value of each combination
for timelags in [1, 5, 10, 25, 60, 120, 250]:
for holddays in [1, 5, 10, 25, 60, 120, 250]:
weight_change_lagged = df.pct_change(periods=timelags)
weight_change_future = df.shift(-holddays).pct_change(periods=holddays)
if (timelags >= holddays):
indepSet=range(0, weight_change_lagged.shape[0], holddays)
else:
indepSet=range(0, weight_change_lagged.shape[0], timelags)
weight_change_lagged = weight_change_lagged.iloc[indepSet]
weight_change_future = weight_change_future.iloc[indepSet]
not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values
(correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])
print('%4i %4i %7.4f %7.4f' % (timelags, holddays, correlation, p-value))
The loop executes well, however, it fails when it comes to calculating the pearsonr correlation and p-value, i.e at this section:
(correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])
It generates this error:
TypeError: no loop matching the specified signature and casting was found for ufunc add
Does anyone have any clues on how to fix my problem? I looked through the forums and found no answers that fit my exact requirements.
Through random tinkering, I managed to solve my problem as follows:
scipy's pearsonr package only accepts arrays or array-like inputs. This means that:
However, complete Pandas Dataframes of the variables, even if they contain one column, do not work.
So, I edited the problematic segment of the code as follows:
# Define an object containing observations that are not NA
not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values
# Remove na values before inputting the data into the peasonr function (not within the function as I had done):
weight_change_lagged = weight_change_lagged[not_na]
weight_change_future = weight_change_future[not_na]
# Input Pandas Series of the Future and Lagged Variables into the function
(correlation, p-value)=pearsonr(weight_change_lagged['Weight'], weight_change_future['Weight'])
With just that slight modification, the code executes without hitches.
Note:
If you use double square brackets, as follows, you are inputting a pandas dataframe not a series, and the pearsonr function will throw an error:
weight_change_future[['Weight']]
Thanks to everyone who tried to help, you questions led me to the answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With