Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(Pandas) : What is the difference ISIN() and contains ()

I want to know if a specific string is present in some columns of my dataframe (a different string for each column). From what I understand isin() is written for dataframes but can work for Series as well, while str.contains() works better for Series.

I don't understand how I should choose between the two. (I searched for similar questions but didn't find any explanation on how to choose between the two.)

like image 655
Baptiste Avatar asked Oct 31 '18 08:10

Baptiste


People also ask

Does index matter in pandas Isin?

Hence index matters when the Series is passed as value. When a Pandas DataFrame is passed as a parameter value to the isin () method, both index and column of the passed DataFrame must match. If both the DataFrames are same but column names don’t match, the result will show False for those columns.

What is pandas Isin () method in Python?

Pandas is one of those packages and makes importing and analyzing data much easier. Pandas isin() method is used to filter data frames. isin() method helps in selecting rows with having a particular(or Multiple) value in a particular column.

How to check if the input value is present in pandas index?

Pandas Index.contains () function return a boolean indicating whether the provided key is in the index. If the input value is present in the Index then it returns True else it returns False indicating that the input value is not present in the Index. Example #1: Use Index.contains () function to check if the given date is present in the Index.

How to filter data in pandas Dataframe?

So let’s get started. Pandas isin () method is used to filter the data present in the DataFrame. This method checks whether each element in the DataFrame is contained in specified values. This method returns the DataFrame of booleans. If the element is present in the specified values, the returned DataFrame contains True, else it shows False.


1 Answers

.isin checks if each value in the column is contained in a list of arbitrary values. Roughly equivalent to value in [value1, value2].

str.contains checks if arbitrary values are contained in each value in the column. Roughly equivalent to substring in large_string.

In other words, .isin works column-wise and is available for all data types. str.contains works element-wise and makes sense only when dealing with strings (or values that can be represented as strings).

From the official documentation:

Series.isin(values)

Check whether values are contained in Series. Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.


Series.str.contains(pat, case=True, flags=0, na=nan,** **regex=True)

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Examples:

print(df)
#     a
# 0  aa
# 1  ba
# 2  ca

print(df[df['a'].isin(['aa', 'ca'])])
#     a
# 0  aa
# 2  ca

print(df[df['a'].str.contains('b')])
#     a
# 1  ba
like image 55
DeepSpace Avatar answered Oct 18 '22 01:10

DeepSpace