Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of "in" keyword or subquery in pandas

Tags:

python

pandas

I have a Series object (let's call this MySeries) which contains a list of integers.

I also have a separate dataframe (say MyDataFrame), which includes a column/field called MyField.

I want to select all records from MyDataFrame where the value in MyField is in MySeries

The equivalent SQL would be:

Select * from MyDataFrame 
where MyField in 
    (select * from MySeries)

Could anyone suggest the best way to do this?

Thanks very much for any help.

like image 264
RobinL Avatar asked Oct 29 '13 19:10

RobinL


People also ask

Is at and LOC same in pandas?

at is a single element and using . loc maybe a Series or a DataFrame. Returning single value is not the case always. It returns array of values if the provided index is used multiple times.

Is pandas query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.

How do I select a query in pandas?

The SELECT statement is used to select columns of data from a table. To do the same thing in pandas we just have to use the array notation on the data frame and inside the square brackets pass a list with the column names you want to select. The SELECT DISTINCT statement returns only unique rows form a table.


1 Answers

you can use isin() function:

>>> df = pd.DataFrame({'A':[1,2,3,4,5], 'B':list('ABCDE')})
>>> f = pd.Series([1,2])
>>> df[df['A'].isin(f)]
   A  B
0  1  A
1  2  B

so, first you get fiter Series:

>>> df['A'].isin(f)
0     True
1     True
2    False
3    False
4    False

And then use it to filter your DataFrame

like image 171
Roman Pekar Avatar answered Oct 14 '22 20:10

Roman Pekar