Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas version of numpy's ones_like

consider pd.Series s

s = pd.Series([.4, .5, .6], list('abc'))
s

a    0.4
b    0.5
c    0.6
dtype: float64

I've done this before to get a series of ones

pd.Series(np.ones_like(s.values), s.index, name=s.name)

a    1.0
b    1.0
c    1.0
dtype: float64

What is a better way?

like image 442
piRSquared Avatar asked Mar 11 '23 15:03

piRSquared


1 Answers

You can make use of Series.copy and by disabling it's deep paramater to speed up the entire process. Later, use ndarray.fill to fill all those values present in the series with 1.

Let's take a DF to illustrate whose values are half filled with Nan:

np.random.seed(42)
df = pd.DataFrame(np.random.randn(10**6,), columns=['A'])
# Populate values with Nans
df.loc[df.sample(frac=0.5).index] = np.NaN

df.shape
# (1000000, 1)

def fill_ones_with_modify():
    ser = df['A'].copy(deep=False)     # use copy() → without modifying the original DF
    ser.values.fill(1)
    return ser

%timeit fill_ones_with_modify()
1000 loops, best of 3: 837 µs per loop

Note: This operates inplace on the series and so the the resulting series of the DF would be altered as well (filled with 1's).


Another way is to access the series as a single column of the DF and flatten it after copying to return a series object instead. This however, takes much more time as the underlying data and the indices are copied. Upside - Doesn't modify the referenced series object.

def fill_ones_without_modify():
    ser = df[['A']].copy(deep=False).squeeze()
    ser.values.fill(1)
    return ser

%timeit fill_ones_without_modify()
100 loops, best of 3: 6.4 ms per loop
like image 198
Nickil Maveli Avatar answered Mar 28 '23 03:03

Nickil Maveli