Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy Conditional Max of Range

I'm trying to make a version of my program faster using as much Pandas and Numpy as possible. I am new to Numpy but have been grasping most of it, but I am having trouble with conditional formatting a column with the max of a range. This is the code I am trying to use to achieve this:

x=3
df1['Max']=numpy.where(df1.index>=x,max(df1.High[-x:],0))

Basically, I am trying to conditionally put the maximum value over the last 3 entries into a cell and repeat down the column. Any and all help is appreciated.

like image 600
BROB1 Avatar asked Feb 07 '23 12:02

BROB1


2 Answers

Use Scipy's maximum_filter -

from scipy.ndimage.filters import maximum_filter1d

df['max'] = maximum_filter1d(df.High,size=3,origin=1,mode='nearest')

Basically, maximum_filter operates in a sliding window looking for maximum in that window. Now, by default each such max computation would be performed with window being centered at the index itself. Since, we are looking to go three elements before and ending at the current one, we need to change that centeredness with the parameter origin. Therefore, we have it set at 1.

Sample run -

In [21]: df
Out[21]: 
   High  max
0    13   13
1    77   77
2    16   77
3    30   77
4    25   30
5    98   98
6    79   98
7    58   98
8    51   79
9    23   58

Runtime test

Got me interested to see how this Scipy's sliding max operation performs against Pandas's rolling max method on performance. Here's some results on big datasizes -

In [55]: df = pd.DataFrame(np.random.randint(0,99,(10000)),columns=['High'])

In [56]: %%timeit  # @Merlin's rolling based solution :
    ...: df['max'] = df.High.rolling(window=3, min_periods=1).max()
    ...: 
1000 loops, best of 3: 1.35 ms per loop

In [57]: %%timeit  # Using Scipy's max filter :
    ...: df['max1'] = maximum_filter1d(df.High,size=3,\
    ...: origin=1,mode='nearest')
    ...: 
1000 loops, best of 3: 487 µs per loop
like image 98
Divakar Avatar answered Feb 13 '23 05:02

Divakar


Here is the logic on np.where

 numpy.where('test something,if true ,if false)

I think you need below.

dd= {'to': [100, 200, 300, 400, -500, 600, 700,800, 900, 1000]}

df = pd.DataFrame(dd)
df

         to
0   100
1   200
2   300
3   400
4  -500
5   600
6   700
7   800
8   900
9  1000

df['Max'] =  df.rolling(window=3, min_periods=1).max()


   to     Max
0   100   100.0
1   200   200.0
2   300   300.0
3   400   400.0
4  -500   400.0
5   600   600.0
6   700   700.0
7   800   800.0
8   900   900.0
9  1000  1000.0
like image 41
Merlin Avatar answered Feb 13 '23 07:02

Merlin