I'm trying the following
df = pd.DataFrame({'col1': [1,2,3], 'col2': [[1,2], [3,4], [3,2]]})
df
col1 col2
0 1 [1, 2]
1 2 [3, 4]
2 3 [3, 2]
and I would like to select rows where col1
value is in col2
list
df[df.col1.isin(df.col2)]
Empty DataFrame
Columns: [col1, col2]
Index: []
But I get the empty df. Why does the isin
function doesn't work?
Why does the
isin
function doesn't work?
Series.isin
takes a set or list. When you call:
df.col1.isin(some_set_or_list)
col1
is in the entire some_set_or_list
, which in this case is [[1,2],[3,4],[3,2]]
.col1[0]
is in some_set_or_list[0]
, if col1[1]
is in some_set_or_list[1]
, etc. In fact some_set_or_list
can be a totally different length than col1
.For example if col1
's first value were [3,4]
instead:
df = pd.DataFrame({'col1': [[3,4],2,3], 'col2': [[1,2], [3,4], [3,2]]})
# col1 col2
# 0 [3, 4] [1, 2]
# 1 2 [3, 4]
# 2 3 [3, 2]
isin
would give you True
for the first row since [3,4]
is somewhere in col2
as a whole (not element-wise):
df.col1.isin(df.col2)
# 0 True
# 1 False
# 2 False
# Name: col1, dtype: bool
What you're actually trying to do is a row-wise test, which is what Rob's answer does:
df.apply(lambda row: row.col1 in row.col2, axis=1)
Simple case of apply(axis=1)
with required condition to return a boolean that can then be used as a mask in loc[]
import pandas as pd
df = pd.DataFrame({'col1': [1,2,3], 'col2': [[1,2], [3,4], [3,2]]})
df.loc[df.apply(lambda r: r["col1"] in r["col2"], axis=1)]
col1 | col2 | |
---|---|---|
0 | 1 | [1, 2] |
2 | 3 | [3, 2] |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With