Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas series pairwise maximum

I want to find the pairwise maximum between each element in a pandas Series and 0. My crude solution is as follows:

import numpy as np
import pandas as pd
np.random.seed(1)

series = pd.Series(np.random.randn(100))
pmax = pd.Series([])
for i in range(len(series)):
    pmax[i] = max(series[i],0)

I need to run this on a large number of series, and this solution is too slow. Is there a vectorized approach to achieve the same result?

like image 794
user3294195 Avatar asked Jan 28 '23 14:01

user3294195


2 Answers

I was searching for a solution to the python implementation of Rs pmax() and stumbled over numpys maximum() function which does exactly what pmax() does:

pmax(5,c(1,2,6))
[1] 5 5 6

And:

>>> import numpy as np
>>> np.maximum(5, [1,2,6])
array([5, 5, 6])
like image 132
nebroth Avatar answered Jan 31 '23 21:01

nebroth


Setup

s = pd.Series([1,2,3,-1,-2,3,4,-5])

Using mask with 0 as fill value:

s.mask(s<0, 0)

0    1
1    2
2    3
3    0
4    0
5    3
6    4
7    0
dtype: int64

Using np.clip with no upper bound:

np.clip(s, 0, None)

@Coldspeed suggested using pd.Series.clip_lower:

s.clip_lower(0)

Timings

In [204]: %%timeit
     ...: pmax = pd.Series([])
     ...: for i in range(len(series)):
     ...:     pmax[i] = max(series[i],0)
     ...:
81.2 ms ± 4.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [205]: %timeit series.mask(series<0, 0)
626 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [206]: %timeit np.clip(series, 0, None)
124 µs ± 3.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [209]: %timeit series.clip_lower(0)
97.2 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
like image 37
user3483203 Avatar answered Jan 31 '23 23:01

user3483203