Using np.where but maintaining exisitng values if condition is False

Tags:

I like np.where, but have never fully got to grip with it.

I have a dataframe lets say it looks like this:

import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})

Now what I want to do is replace the 0 values with NaN values when all row values are zero. Critically I want to maintain whatever other values are in the row in the cases where all row values are not zero.

I want to do something like this:

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, ???)

I put the ??? to indicate that I do not know what value to place there if the condition is False, I just want to preserve whatever is there already. Is this possible with np.where, or should I use another technique?

671

asked Sep 08 '14 04:09

Woody Pride

2 Answers

There is a pandas.Series method (where incidentally) for exactly this kind of task. It seems a little backward at first, but from the documentation.

Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

So, your example would become

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col].where(~condition, np.nan, inplace=True)

But, if all you're trying to do is replace rows of all zeros for specific set of columns with NA, you could do this instead

DF.loc[condition, cols] = NA

EDIT

To answer your original question, np.where follows the same broadcasting rules as other array operations so you would replace ??? with DF[col], changing your example to:

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, DF[col])

answered Sep 16 '22 22:09

JaminSore

Proposed solutions work but for numpy array there is a simpler way without using DataFrame.

A solution would be : np_array[np.where(condition)] = value_of_condition_true_rows

answered Sep 19 '22 22:09

partizanos

Related questions
                            
                                Scipy hstack results in "TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))"
                            
                                What is Numpy equivalence of dataframe.loc() in Pandas
                            
                                Find Distance to Nearest Zero in NumPy Array
                            
                                Left inverse in numpy or scipy?
                            
                                Difference between HDF5 file and PyTables file
                            
                                In numpy, what does indexing an array with the empty tuple vs. ellipsis do?
                            
                                Prevent or dismiss 'empty file' warning in loadtxt
                            
                                numpy array is printed into file with unwanted wrapping
                            
                                scipy - generate random variables with correlations
                            
                                Pyinstaller numpy "Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll"
                            
                                Fast absolute difference of two uint8 arrays
                            
                                Apply function to dataframe column element based on value in other column for same row?
                            
                                Check if all sides of a multidimensional numpy array are arrays of zeros
                            
                                Python eigenvalue computations run much slower than those of MATLAB on my computer. Why?
                            
                                Programmatically add column names to numpy ndarray
                            
                                Python numpy.random.normal only positive values
                            
                                Matrix multiplication on CPU (numpy) and GPU (gnumpy) give different results
                            
                                Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS
                            
                                Add numpy.get_include() argument to setuptools without preinstalled numpy
                            
                                Finding the indices of the top three values via argmin() or min() in python/numpy without mutation of list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using np.where but maintaining exisitng values if condition is False

Tags:

pandas

numpy

where

Woody Pride

People also ask

2 Answers

JaminSore

partizanos

Recent Activity

Donate For Us