I want to know if a specific string is present in some columns of my dataframe (a different string for each column).
From what I understand isin()
is written for dataframes but can work for Series as well, while str.contains()
works better for Series.
I don't understand how I should choose between the two. (I searched for similar questions but didn't find any explanation on how to choose between the two.)
Hence index matters when the Series is passed as value. When a Pandas DataFrame is passed as a parameter value to the isin () method, both index and column of the passed DataFrame must match. If both the DataFrames are same but column names don’t match, the result will show False for those columns.
Pandas is one of those packages and makes importing and analyzing data much easier. Pandas isin() method is used to filter data frames. isin() method helps in selecting rows with having a particular(or Multiple) value in a particular column.
Pandas Index.contains () function return a boolean indicating whether the provided key is in the index. If the input value is present in the Index then it returns True else it returns False indicating that the input value is not present in the Index. Example #1: Use Index.contains () function to check if the given date is present in the Index.
So let’s get started. Pandas isin () method is used to filter the data present in the DataFrame. This method checks whether each element in the DataFrame is contained in specified values. This method returns the DataFrame of booleans. If the element is present in the specified values, the returned DataFrame contains True, else it shows False.
.isin
checks if each value in the column is contained in a list of arbitrary values. Roughly equivalent to value in [value1, value2]
.
str.contains
checks if arbitrary values are contained in each value in the column. Roughly equivalent to substring in large_string
.
In other words, .isin
works column-wise and is available for all data types. str.contains
works element-wise and makes sense only when dealing with strings (or values that can be represented as strings).
From the official documentation:
Series.isin(values)
Check whether values are contained in Series. Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.
Series.str.contains(pat, case=True, flags=0, na=nan,** **regex=True)
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Examples:
print(df)
# a
# 0 aa
# 1 ba
# 2 ca
print(df[df['a'].isin(['aa', 'ca'])])
# a
# 0 aa
# 2 ca
print(df[df['a'].str.contains('b')])
# a
# 1 ba
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With