Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace dataframe column negative values with nan, in method chain

Tags:

python

pandas

I want to replace by np.nan all the negative numbers that are in column 'b'

  • using a method on df
  • not in place.

Here's the sample frame:

pd.DataFrame({'a': [1, 2] , 'b': [-3, 4], 'c': [5, -6]})

See this question for in-place and non-method solutions.

like image 443
Hatshepsut Avatar asked Jan 18 '18 01:01

Hatshepsut


People also ask

How to replace Nan with zero in pandas Dataframe?

To replace NaN with Zero in multiple columns instead of the complete dataframe, you can pass the subset of pandas dataframe columns as a list and invoke the fillna () method on specific columns. It’ll replace the NaN values in that specific columns. Since it is in the subset of columns, you cannot use the inplace=True parameter.

How do I replace Nan with feature scaling?

Feature scaling is an important preprocessing step in machine learning that can help increase accuracy and training speed. Naive Bayes is a simple but powerful machine learning model that is often used for classification tasks. To replace values with NaN, use the DataFrame's replace (~) method.

How to replace values where condition is true in Dataframe?

Another clean option that I have found useful is pandas.DataFrame.mask which will "replace values where the condition is true." Show activity on this post. Show activity on this post. If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.

How to make changes to the source Dataframe?

If you want to make changes in your source dataframe there are two ways: df = df.replace ('?', np.NaN) or df.replace ('?', np.NaN, inplace=True) @GusevSlava, this point was very helpful.


2 Answers

If assign counts as a method on df, you can recalculate the column b and assign it to df to replace the old column:

df = pd.DataFrame({'a': [1, 2] , 'b': [-3, 4], 'c': [5, -6]})

df.assign(b = df.b.where(df.b.ge(0)))
#   a    b  c
#0  1  NaN  5
#1  2  4.0 -6

For better chaining behavior, you can use lambda function with assign:

df.assign(b = lambda x: x.b.where(x.b.ge(0)))
like image 53
Psidom Avatar answered Sep 30 '22 17:09

Psidom


You can use the loc function.To replace the all the negative values and leverage numpy nan to replace them. sample code look like.

import numpy as np
df=pd.DataFrame({'a': [1, 2] , 'b': [-3, 4], 'c': [5, -6]})
df.loc[~(df['b'] > 0), 'b']=np.nan
like image 25
Avind Avatar answered Sep 30 '22 15:09

Avind