I like np.where, but have never fully got to grip with it.
I have a dataframe lets say it looks like this:
import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})
Now what I want to do is replace the 0 values with NaN values when all row values are zero. Critically I want to maintain whatever other values are in the row in the cases where all row values are not zero.
I want to do something like this:
cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
DF[col] = np.where(condition, NA, ???)
I put the ??? to indicate that I do not know what value to place there if the condition is False, I just want to preserve whatever is there already. Is this possible with np.where, or should I use another technique?
numpy. where returns a tuple because each element of the tuple refers to a dimension. As you can see, the first element of the tuple refers to the first dimension of relevant elements; the second element refers to the second dimension.
We can specify multiple conditions inside the numpy. where() function by enclosing each condition inside a pair of parenthesis and using a & operator between them. In the above code, we selected the values from the array of integers values greater than 2 but less than 4 with the np.
There is a pandas.Series
method (where
incidentally) for exactly this kind of task. It seems a little backward at first, but from the documentation.
Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
So, your example would become
cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
DF[col].where(~condition, np.nan, inplace=True)
But, if all you're trying to do is replace rows of all zeros for specific set of columns with NA
, you could do this instead
DF.loc[condition, cols] = NA
EDIT
To answer your original question, np.where
follows the same broadcasting rules as other array operations so you would replace ???
with DF[col]
, changing your example to:
cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
DF[col] = np.where(condition, NA, DF[col])
Proposed solutions work but for numpy array there is a simpler way without using DataFrame.
A solution would be :
np_array[np.where(condition)] = value_of_condition_true_rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With