Given the following data frame:
import pandas as pd
import numpy as np
DF = pd.DataFrame({'COL1': ['a','b','b'],
'COL2' : [0,np.nan,1],})
DF
COL1 COL2
0 a 0
1 b NaN
2 b 1
I want to be able to assign a new column COL3
that has a value of 2
for every row where COL1
is b
and COL2
is not null.
The desired result is as follows:
COL1 COL2 COL3
0 a 0 0
1 b NaN 0
2 b 1 2
Thanks in advance!
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.
Update column based on another column using CASE statement We use a CASE statement to specify new value of first_name column for each value of id column. This is a much better approach than using WHERE clause because with WHERE clause we can only change a column value to one new value.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
if we want to modify the value of the cell [0,"A"] u can use one of those solution : df. iat[0,0] = 2. df.at[0,'A'] = 2.
In order to apply a function to every row, you should use axis=1 param to apply(), default it uses axis=0 meaning it applies a function to each column. By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
DataFrame - assign() function The assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
Define a function to return your value based on other columns.
def value_handle (row):
if row['COL1'] == 'b' and not pd.isnull(row['COL2']) :
return 2
else:
return 0
Then call the new function when introducing the new column.
DF['COL3'] = DF.apply (lambda row: value_handle (row),axis=1)
This can be achieved using the apply method on the DataFrame. You'll need to pass in a function to apply to each row and set the axis to 1
to set it to the correct mode (apply for each row, instead of for each column).
Here's a working example:
def row_handler(row):
if row['COL1'] == 'b' and not np.isnan(row['COL2']):
return 2
return 0
DF['COL3'] = DF.apply(row_handler, axis=1)
Which returns this:
>> print DF
COL1 COL2 COL3
0 a 0 0
1 b NaN 0
2 b 1 2
You can use numpy.where
with isin
and notnull
:
DF['COL3'] = np.where((DF['COL1'].isin(['b'])) &(DF['COL2'].notnull()), 2, 0)
print DF
COL1 COL2 COL3
0 a 0 0
1 b NaN 0
2 b 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With