Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set all values in one column to NaN if the corresponding values in another column are also NaN

The goal is to maintain the relationship between two columns by setting to NaN all the values from one column in another column.

Having the following data frame:

df = pd.DataFrame({'a': [np.nan, 2, np.nan, 4],'b': [11, 12 , 13, 14]})

     a   b
0  NaN  11
1    2  12
2  NaN  13
3    4  14

Maintaining the relationship from column a to column b, where all NaN values are updated results in:

     a    b
0  NaN  NaN
1    2   12
2  NaN  NaN
3    4   14

One way that it is possible to achieve the desired behaviour is:

df.b.where(~df.a.isnull(), np.nan)

Is there any other way to maintain such a relationship?

like image 590
Krzysztof Słowiński Avatar asked Aug 06 '18 15:08

Krzysztof Słowiński


People also ask

How do I fill NaN based on another column?

Using fillna() to fill values from another column Here, we apply the fillna() function on “Col1” of the dataframe df and pass the series df['Col2'] as an argument. The above code fills the missing values in “Col1” with the corresponding values (based on the index) from “Col2”.

How fill NaN values in pandas with values from another column?

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc. The Below example fills all NaN values with None value.

How do you replace values in a column based on condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

Which function would replace all Na NaN values of a series with the mean?

fillna() from the pandas' library, we can easily replace the 'NaN' in the data frame. Procedure: To calculate the mean() we use the mean function of the particular column. Now with the help of fillna() function we will change all 'NaN' of that particular column for which we have its mean.


2 Answers

Using pd.Series.notnull to avoid having to take the negative of your Boolean series:

df.b.where(df.a.notnull(), np.nan)

But, really, there's nothing wrong with your existing solution.

like image 56
jpp Avatar answered Oct 16 '22 15:10

jpp


Another one would be:

df.loc[df.a.isnull(), 'b'] = df.a

Isn't shorter but does the job.

like image 32
zipa Avatar answered Oct 16 '22 15:10

zipa