Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

df.loc more than 2 conditions

I have a pandas dataframe like this:

df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [100, 200, 300, 400, 500, 
600]})

And I want to create a new column with some value if certain conditions are met. The problem is: These are multiple conditions with & and |. I know I can do this with only two conditions and then multiple df.loc calls, but since my actual dataset is quite huge with many different values the variables can take, I'd like to know if it is possible to do this in one df.loc call. I also tried np.where before, but found df.loc generally easier so it would be nice if I can stick with it.

The code I tried is

df.loc[(df.A == 1) | (df.A == 2) & (df.B == 600) | (df.B == 200), "C"] = 
"1or2and600or200"

which gives me

print(df)  
   A    B                C
0  1  100  1or2and600or200
1  2  200  1or2and600or200
2  3  300              NaN
3  4  400              NaN
4  5  500              NaN
5  6  600              NaN

This however is not what I want, as df.loc likely only considers the first two conditions. So, I would want, in this code example, the value 1or2and600or200 to be only in the first line, not in the second one. Is this possible?

Much thanks.

like image 513
TheBob413849 Avatar asked Jan 17 '19 11:01

TheBob413849


People also ask

What is condition LoC in SQL Server?

.loc allows you to set a condition and the result will be a DataFrame that contains only the rows that match that condition. Now that we understand the basic syntax, let’s move on to a slightly more interesting example.

How do you use LoC in a Dataframe?

.loc allows you to set a condition and the result will be a DataFrame that contains only the rows that match that condition. Now that we understand the basic syntax, let’s move on to a slightly more interesting example. Now, we’ll introduce the syntax that allows you to specify which columns you want .loc to return.

Why would I use LoC in SQL Server?

If you have two or more conditions you would like to use to get a very specific subset of your data, .loc allows you to do that very easily. In our case, let’s take the rows that not only occur after a specific date but also have an Open value greater than a specific value.

Where is the condition 200 in the B column?

Also, for your given condition, it will be present in second row, because it's where you have 200 in B Just a small note that OP could also use .isin (...) to have slightly cleaner conditions.


1 Answers

All good, except you need to take care of extra parenthesis.

df.loc[((df.A == 1) | (df.A == 2)) & ((df.B == 600) | (df.B == 200)), "C"] = "1or2and600or200"

You can also proceed with .isin for more clear and concise picture as referred by @AndrewF

df.loc[df.A.isin([1, 2]) & df.B.isin([600, 200]), 'C'] = "1or2and600or200"

Also, for your given condition, it will be present in second row, because it's where you have 200 in B

like image 116
meW Avatar answered Oct 03 '22 23:10

meW