Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using np.where but maintaining exisitng values if condition is False

I like np.where, but have never fully got to grip with it.

I have a dataframe lets say it looks like this:

import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})

Now what I want to do is replace the 0 values with NaN values when all row values are zero. Critically I want to maintain whatever other values are in the row in the cases where all row values are not zero.

I want to do something like this:

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, ???)

I put the ??? to indicate that I do not know what value to place there if the condition is False, I just want to preserve whatever is there already. Is this possible with np.where, or should I use another technique?

like image 671
Woody Pride Avatar asked Sep 08 '14 04:09

Woody Pride


People also ask

Why does NP where return a tuple?

numpy. where returns a tuple because each element of the tuple refers to a dimension. As you can see, the first element of the tuple refers to the first dimension of relevant elements; the second element refers to the second dimension.

Where can I use and in multiple conditions in NP?

We can specify multiple conditions inside the numpy. where() function by enclosing each condition inside a pair of parenthesis and using a & operator between them. In the above code, we selected the values from the array of integers values greater than 2 but less than 4 with the np.


2 Answers

There is a pandas.Series method (where incidentally) for exactly this kind of task. It seems a little backward at first, but from the documentation.

Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

So, your example would become

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col].where(~condition, np.nan, inplace=True)

But, if all you're trying to do is replace rows of all zeros for specific set of columns with NA, you could do this instead

DF.loc[condition, cols] = NA

EDIT

To answer your original question, np.where follows the same broadcasting rules as other array operations so you would replace ??? with DF[col], changing your example to:

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, DF[col])
like image 50
JaminSore Avatar answered Sep 16 '22 22:09

JaminSore


Proposed solutions work but for numpy array there is a simpler way without using DataFrame.

A solution would be : np_array[np.where(condition)] = value_of_condition_true_rows

like image 24
partizanos Avatar answered Sep 19 '22 22:09

partizanos