Select rows in Dataframe where one column's value is in another list column

Question

I'm trying the following

df = pd.DataFrame({'col1': [1,2,3], 'col2': [[1,2], [3,4], [3,2]]})
df
     col1    col2
0     1      [1, 2]
1     2      [3, 4]
2     3      [3, 2]

and I would like to select rows where col1 value is in col2 list

df[df.col1.isin(df.col2)]
Empty DataFrame
Columns: [col1, col2]
Index: []

But I get the empty df. Why does the isin function doesn't work?

tdy · Accepted Answer

Why does the isin function doesn't work?

Series.isin takes a set or list. When you call:

df.col1.isin(some_set_or_list)

It just tests if each value of col1 is in the entire some_set_or_list, which in this case is [[1,2],[3,4],[3,2]].
It does not test if col1[0] is in some_set_or_list[0], if col1[1] is in some_set_or_list[1], etc. In fact some_set_or_list can be a totally different length than col1.

For example if col1's first value were [3,4] instead:

df = pd.DataFrame({'col1': [[3,4],2,3], 'col2': [[1,2], [3,4], [3,2]]})

#      col1    col2
# 0  [3, 4]  [1, 2]
# 1       2  [3, 4]
# 2       3  [3, 2]

isin would give you True for the first row since [3,4] is somewhere in col2 as a whole (not element-wise):

df.col1.isin(df.col2)

# 0     True
# 1    False
# 2    False
# Name: col1, dtype: bool

What you're actually trying to do is a row-wise test, which is what Rob's answer does:

df.apply(lambda row: row.col1 in row.col2, axis=1)

Rob Raymond · Answer

Simple case of apply(axis=1) with required condition to return a boolean that can then be used as a mask in loc[]

import pandas as pd

df = pd.DataFrame({'col1': [1,2,3], 'col2': [[1,2], [3,4], [3,2]]})

df.loc[df.apply(lambda r: r["col1"] in r["col2"], axis=1)]

output

	col1	col2
0	1	[1, 2]
2	3	[3, 2]

Select rows in Dataframe where one column's value is in another list column

Tags:

pandas

dataframe

Rokas.ma

2 Answers

tdy

output

Rob Raymond

Recent Activity

Donate For Us

Select rows in Dataframe where one column's value is in another list column

Tags:

pandas

dataframe

Rokas.ma

2 Answers

tdy

output

Rob Raymond

Related questions

Recent Activity

Donate For Us