I have a pandas dataframe like this:
df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [100, 200, 300, 400, 500,
600]})
And I want to create a new column with some value if certain conditions are met. The problem is: These are multiple conditions with &
and |
. I know I can do this with only two conditions and then multiple df.loc
calls, but since my actual dataset is quite huge with many different values the variables can take, I'd like to know if it is possible to do this in one df.loc
call. I also tried np.where
before, but found df.loc
generally easier so it would be nice if I can stick with it.
The code I tried is
df.loc[(df.A == 1) | (df.A == 2) & (df.B == 600) | (df.B == 200), "C"] =
"1or2and600or200"
which gives me
print(df)
A B C
0 1 100 1or2and600or200
1 2 200 1or2and600or200
2 3 300 NaN
3 4 400 NaN
4 5 500 NaN
5 6 600 NaN
This however is not what I want, as df.loc
likely only considers the first two conditions. So, I would want, in this code example, the value 1or2and600or200
to be only in the first line, not in the second one. Is this possible?
Much thanks.
.loc allows you to set a condition and the result will be a DataFrame that contains only the rows that match that condition. Now that we understand the basic syntax, let’s move on to a slightly more interesting example.
.loc allows you to set a condition and the result will be a DataFrame that contains only the rows that match that condition. Now that we understand the basic syntax, let’s move on to a slightly more interesting example. Now, we’ll introduce the syntax that allows you to specify which columns you want .loc to return.
If you have two or more conditions you would like to use to get a very specific subset of your data, .loc allows you to do that very easily. In our case, let’s take the rows that not only occur after a specific date but also have an Open value greater than a specific value.
Also, for your given condition, it will be present in second row, because it's where you have 200 in B Just a small note that OP could also use .isin (...) to have slightly cleaner conditions.
All good, except you need to take care of extra parenthesis.
df.loc[((df.A == 1) | (df.A == 2)) & ((df.B == 600) | (df.B == 200)), "C"] = "1or2and600or200"
You can also proceed with .isin
for more clear and concise picture as referred by @AndrewF
df.loc[df.A.isin([1, 2]) & df.B.isin([600, 200]), 'C'] = "1or2and600or200"
Also, for your given condition, it will be present in second row, because it's where you have 200
in B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With