How to efficiently fillna(0) if series is all-nan, or else remaining non-nan entries are zero?

Question

Given that I have a pandas Series, I want to fill the NaNs with zero if either all the values are NaN or if all the values are either zero or NaN.

For example, I would want to fill the NaNs in the following Series with zeroes.

0       0
1       0
2       NaN
3       NaN
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN

But, I would not want to fillna(0) the following Series:

0       0
1       0
2       2
3       0
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN

I was looking at the documentation and it seems like I could use pandas.Series.value_counts to ensure the values are only 0 and NaN, and then simply call fillna(0).In other words, I am looking to check if set(s.unique().astype(str)).issubset(['0.0','nan']), THEN fillna(0), otherwise do not.

Considering how powerful pandas is, it seemed like a there may be a better way to do this. Does anyone have any suggestions to do this cleanly and efficiently?

Potential solution thanks to cᴏʟᴅsᴘᴇᴇᴅ

if s.dropna().eq(0).all():
    s = s.fillna(0)

jezrael · Accepted Answer

You can compare by 0 and isna if only NaNs and 0 and then fillna:

if ((s == 0) | (s.isna())).all():
    s = pd.Series(0, index=s.index)

Or compare unique values:

if pd.Series(s.unique()).fillna(0).eq(0).all():
    s = pd.Series(0, index=s.index)

@cᴏʟᴅsᴘᴇᴇᴅ solution, thank you - compare Series without NaNs with dropna:

 if s.dropna().eq(0).all():
    s = pd.Series(0, index=s.index)

Solution from question - need convert to strings, because problem with compare with NaNs:

if set(s.unique().astype(str)).issubset(['0.0','nan']):

    s = pd.Series(0, index=s.index)

Timings:

s = pd.Series(np.random.choice([0,np.nan], size=10000))

In [68]: %timeit ((s == 0) | (s.isna())).all()
The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 574 µs per loop

In [69]: %timeit pd.Series(s.unique()).fillna(0).eq(0).all()
1000 loops, best of 3: 587 µs per loop

In [70]: %timeit s.dropna().eq(0).all()
The slowest run took 4.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop

In [71]: %timeit set(s.unique().astype(str)).issubset(['0.0','nan'])
The slowest run took 5.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 157 µs per loop

Alexander · Answer

Create a mask for the null values. Check if the length of the mask equals the length of the series (in which case the series are either all null values or empty) or if the non-nulls are all equal to zero. If so, create a new series of zero values using the original index from the series.

nulls = s.isnull()
if len(nulls) == len(s) or s[~nulls].eq(0).all():
    s = pd.Series(0, index=s.index)

TIMINGS

%%timeit s_ = pd.concat([s] * 100000)
nulls = s_.isnull()
if len(nulls) == len(s_) or s_[~nulls].eq(0).all():
    s_ = pd.Series(0, index=s_.index)
# 100 loops, best of 3: 2.33 ms per loop

# OP's solution:
%%timeit s_ = pd.concat([s] * 100000)
if s_.dropna().eq(0).all():
    s_ = s_.fillna(0)
# 10 loops, best of 3: 19.7 ms per loop

# @Jezrael's fastest solution:
%%timeit s_ = pd.concat([s] * 100000)
if set(s_.unique().astype(str)).issubset(['0.0','nan']):
    s_ = pd.Series(0, index=s_.index)
# 1000 loops, best of 3: 4.58 ms per loop

How to efficiently fillna(0) if series is all-nan, or else remaining non-nan entries are zero?

Tags:

python

pandas

conditional-statements

fillna

multiple-conditions

Lien

2 Answers

jezrael

Alexander

Recent Activity

Donate For Us

How to efficiently fillna(0) if series is all-nan, or else remaining non-nan entries are zero?

Tags:

python

pandas

conditional-statements

fillna

multiple-conditions

Lien

2 Answers

jezrael

Alexander

Related questions

Recent Activity

Donate For Us