How to check if all the elements in list are present in pandas column

I have a dataframe and a list:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8], 
    'char':[['a','b'],['a','b','c'],['a','c'],['b','c'],[],['c','a','d'],['c','d'],['a']]})

names = ['a','c']

I want to get rows only if both a and c both are present in char column.(order doesn't matter here)

Expected Output:

       char  id                                                                                                                      
1  [a, b, c]   2                                                                                                                      
2     [a, c]   3                                                                                                                      
5  [c, a, d]   6

My Efforts

true_indices = []
for idx, row in df.iterrows():
    if all(name in row['char'] for name in names):
        true_indices.append(idx)


ids = df[df.index.isin(true_indices)]

Which is giving me correct output but it is too slow for large dataset so I am looking for more efficient solution.

How do you check if a list of items is present in a DataFrame column?

Method 1: Use isin() function In this scenario, the isin() function check the pandas column containing the string present in the list and return the column values when present, otherwise it will not select the dataframe columns.

How do you check if a list of values is in a column pandas?

Pandas. Series. isin() function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.

How do you check if a column is present in pandas DataFrame?

Check if Column Exists in Pandas using issubset() To check whether the 'CarName' and 'Price' columns exist in Dataframe or not using issubset() function.

You can build a set from the list of names for a faster lookup, and use set.issubset to check if all elements in the set are contained in the column lists:

names = set(['a','c'])
df[df['char'].map(names.issubset)]

   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

Use list comprehension with issubset:

mask = [set(names).issubset(x) for x in df['char']]
df = df[mask]
print (df)
   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

Another solution with Series.map:

df = df[df['char'].map(set(names).issubset)]
print (df)
   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

Performance Depends of number of rows and number of matched values:

df = pd.concat([df] * 10000, ignore_index=True)

In [270]: %timeit df[df['char'].apply(lambda x: set(names).issubset(x))]
45.9 ms ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [271]: %%timeit
     ...: names = set(['a','c'])
     ...: [names.issubset(set(row)) for _,row in df.char.iteritems()]
     ...: 
46.7 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [272]: %%timeit
     ...: df[[set(names).issubset(x) for x in df['char']]]
     ...: 
45.6 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [273]: %%timeit
     ...: df[df['char'].map(set(names).issubset)]
     ...: 
18.3 ms ± 2.96 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [274]: %%timeit
     ...: n = set(names)
     ...: df[df['char'].map(n.issubset)]
     ...: 
16.6 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [279]: %%timeit
     ...: names = set(['a','c'])
     ...: m = [name.issubset(i) for i in df.char.values.tolist()]
     ...: 
19.2 ms ± 317 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How to check if all the elements in list are present in pandas column

Tags:

python

python-3.x

pandas

Sociopath

People also ask

2 Answers

yatu

jezrael

Recent Activity

Donate For Us

How to check if all the elements in list are present in pandas column

Tags:

python

python-3.x

pandas

Sociopath

People also ask

2 Answers

yatu

jezrael

Related questions

Recent Activity

Donate For Us