Pandas Data Frame Filtering Multiple Conditions

Question

I have the following data frame

df = pd.DataFrame([[1990,7,1000],[1990,8,2500],[1990,9,2500],[1990,9,1500],[1991,1,250],[1991,2,350],[1991,3,350],[1991,7,450]], columns = ['year','month','data1'])

year    month    data1
1990      7      1000
1990      8      2500
1990      9      2500
1990      9      1500
1991      1      250
1991      2      350
1991      3      350
1991      7      450

I would like to filter the data such that it won't contain data with month/year 07/1990, 08/1990 and 01/1991. I can do for each combination month/year as follow:

df = df.loc[(df.year != 1990) | (df.month != 7)]

But it is not efficient if there are many combinations month/year. Is there any more efficient way of doing this?

Many thanks.

Dani Mesejo · Accepted Answer

You could do:

mask = ~df[['year', 'month']].apply(tuple, 1).isin([(1990, 7), (1990, 8), (1991, 1)])
print(df[mask])

Output

   year  month  data1
2  1990      9   2500
3  1990      9   1500
5  1991      2    350
6  1991      3    350
7  1991      7    450

Pierre D · Answer

Even faster (roughly 3x than the elegant version of @DaniMesejo applying tuple). But also it relies on the knowledge that months are bounded to (well below) 100, so less generalizable:

mask = ~(df.year*100 + df.month).isin({199007, 199008, 199101})
df[mask]

# out:
   year  month  data1
2  1990      9   2500
3  1990      9   1500
5  1991      2    350
6  1991      3    350
7  1991      7    450

How come this is 3x faster than the tuples solution? (Tricks for speed):

All vectorized operations and no apply.
No string operations, all ints.
Using .isin() with a set as argument (not a list).

Pandas Data Frame Filtering Multiple Conditions

Tags:

python

pandas

filter

teteh May

2 Answers

Dani Mesejo

Pierre D

Recent Activity

Donate For Us

Pandas Data Frame Filtering Multiple Conditions

Tags:

python

pandas

filter

teteh May

2 Answers

Dani Mesejo

Pierre D

Related questions

Recent Activity

Donate For Us