I have the following dataframe:
import pandas as pd
import numpy as np
raw_data = {
'Score1': [42, 52, -999, 24, 73],
'Score2': [-999, -999, -999, 2, 1],
'Score3': [2, 2, -999, 2, -999]}
df = pd.DataFrame(raw_data, columns = ['Score1', 'Score2', 'Score3'])
and I want to replace the -999's with NaN only in columns Score2 and Score3, leaving column Score1 unchanged. I want to reference the columns to be modified by name and they might not be consecutive.
I have tried things like:
df.loc[:,('Score2', 'Score3')].replace(-999, np.nan, inplace=True)
df
but this does not work, I assume because it is operating on the copy. Is there a way of doing this in one statement?
I looked at Pandas replacing values on specific columns but found it quite confusing, so felt a simpler example would help.
You cannnot use inplace=True
, because subset returns a Series which may have its data as a view. Modifying it in place doesn't ALWAYS propogate it back to the parent object. That's why SettingWithCopyWarning
is possible there (or raise if you set the option). You should never do this, nor is their ever a reason to do so.
df[['Score2', 'Score3']] = df[['Score2', 'Score3']].replace(-999, np.nan)
print (df)
Score1 Score2 Score3
0 42 NaN 2.0
1 52 NaN 2.0
2 -999 NaN NaN
3 24 2.0 2.0
4 73 1.0 NaN
Use
In [282]: df.replace({'Score2': -999, 'Score3': -999}, np.nan)
Out[282]:
Score1 Score2 Score3
0 42 NaN 2.0
1 52 NaN 2.0
2 -999 NaN NaN
3 24 2.0 2.0
4 73 1.0 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With