Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up pandas dataframe iteration

I have a dataframe with date and values,

 Date     Price
Jun 30    95.60
Jun 29    94.40
Jun 28    93.59
Jun 27    92.04
Jun 24    93.40
Jun 23    96.10
Jun 22    95.55
Jun 21    95.91
Jun 20    95.10
Jun 17    95.33
Jun 16    97.55
Jun 15    97.14
Jun 14    97.46
Jun 13    97.34
Jun 10    98.83
Jun 9     99.65
Jun 8     98.94
Jun 7     99.03
Jun 6     98.63
Jun 3     97.92
Jun 2     97.72

There is a function which iterate through dateframe,

indic_up = [False, False,False, False]
i = 4
while i+4 <= df.index[-1]:
    if (df.get_value(i, 'value') > df.get_value(i-1, 'value')) or
        (df.get_value(i, 'value') > df.get_value(i-2, 'value')) or
        (df.get_value(i, 'value') > df.get_value(i-3, 'value')) or
        (df.get_value(i, 'value') > df.get_value(i-4, 'value')):indic_up.append(True)
    else:indic_up.append(False)
    i = i+1

The logic of this function is if value of today greater than yesterday,day before yesterday or before that then it's true or false. This functions seems to be very slow to me, So how i can rewrite this function like these

for index, row in df.iterrows():
row['a'], index

or

for idx in df.index:
df.ix[idx, 'a'], idx

or can i achieve more fast by converting dataframe into numpy array?

like image 689
ProgR Avatar asked Oct 19 '22 06:10

ProgR


1 Answers

Let's invite Scipy too!

The Idea : Compare the current element with the previous 4 values by calculating the minimum in that interval and comparing with the current one. If it matches, we have basically failed all the comparisons and thus choose False. So, codewise, just compare the current element with the minimum in that interval. This is where scipy comes in with its minimum_filter.

Implementation :

from scipy.ndimage.filters import minimum_filter

# Extract values from relevant column into a NumPy array for further procesing
A = df['value'].values

# Look for no match with interval-ed min & look for NOT matching for True as o/p
indic_up_out = A != minimum_filter(A,footprint=np.ones((5,)),origin=2)

# Set first four as False because those would be invalid with a 5 elem runway
indic_up_out[:4] = 0
like image 62
Divakar Avatar answered Oct 22 '22 10:10

Divakar