I have a dataframe (df) that looks like this:
environment event
time
2017-04-28 13:08:22 NaN add_rd
2017-04-28 08:58:40 NaN add_rd
2017-05-03 07:59:35 test add_env
2017-05-03 08:05:14 prod add_env
...
Now my goal is for each add_rd
in the event
column, the associated NaN
-value in the environment
column should be replaced with a string RD
.
environment event
time
2017-04-28 13:08:22 RD add_rd
2017-04-28 08:58:40 RD add_rd
2017-05-03 07:59:35 test add_env
2017-05-03 08:05:14 prod add_env
...
What I did so far
I stumbled across df['environment'] = df['environment].fillna('RD')
which replaces every NaN
(which is not what I am looking for), pd.isnull(df['environment'])
which is detecting missing values and np.where(df['environment'], x,y)
which seems to be what I want but isn't working. Furthermore did I try this:
import pandas as pd
for env in df['environment']:
if pd.isnull(env) and df['event'] == 'add_rd':
env = 'RD'
The indexes are missing or some kind of iterator to access the equivalent value in the event
column.
And I tried this:
df['environment'] = np.where(pd.isnull(df['environment']), df['environment'] = 'RD', df['environment'])
SyntaxError: keyword can't be an expression
which obviously didn't worked.
I took a look at several questions but couldn't build on the suggestions in the answers. Black's question Simon's question szli's question Jan Willems Tulp's question
So, how do I replace a value in a column based on another columns values?
Use syntax pandas. DataFrame. loc [boolean_condition, column_name] = new_value where boolean_condition is a boolean condition, column_name is a column in the original DataFrame , and new_value is the new value with which to replace the old values in the rows satisfying the condition.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD.
As per @Zero's comment, use pd.DataFrame.loc
and Boolean indexing:
df.loc[df['event'].eq('add_rd') & df['environment'].isnull(), 'environment'] = 'RD'
You could consider using where
:
df.environment.where((~df.environment.isnull()) & (df.event != 'add_rd'),
'RD', inplace=True)
If the condition is not met, the values is replaced by the second element.
Replace values in specific column using DataFrame.loc
In [1]: import pandas as pd
In [2]: dictionary = {'time': ['2017-04-28 13:08:22', '2017-04-28 08:58:40',
'2017-05-03 07:59:35','2017-05-03 08:05:14'],
'environment': ['NaN', 'NaN', 'test', 'prod'],
'event': ['add_rd', 'add_rd', 'add_env', 'add_env']
}
In [3]: df = pd.DataFrame(dictionary, columns= ['time', 'environment', 'event'])
print(df)
Out [3]: time environment event
0 2017-04-28 13:08:22 NaN add_rd
1 2017-04-28 08:58:40 NaN add_rd
2 2017-05-03 07:59:35 test add_env
3 2017-05-03 08:05:14 prod add_env
In [4]: df.loc[df['event'] == 'add_rd', 'environment'] = 'RD'
print(df)
Out [4]: time environment event
0 2017-04-28 13:08:22 RD add_rd
1 2017-04-28 08:58:40 RD add_rd
2 2017-05-03 07:59:35 test add_env
3 2017-05-03 08:05:14 prod add_env
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With