I'm trying to get a boolean index of whether one column contains a string from the same row in another column:
a b
boop beep bop
zorp zorpfoo
zip foo zip fa
In check to see if column b contains a string, I'd like to get:
[False, True, True]
Right now I'm trying this approach, but it is slow:
df.apply(lambda row: row['a'] in row['b'], axis=1)
Is there a .str method for this?
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.
The results show that apply massively outperforms iterrows . As mentioned previously, this is because apply is optimized for looping through dataframe rows much quicker than iterrows does. While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead.
from random import sample
from string import lowercase
from pandas import DataFrame
df = DataFrame({
'a': map(lambda x: ''.join(sample(lowercase, 2)), range(100000)),
'b': map(lambda x: ''.join(sample(lowercase, 5)), range(100000))
})
%time map(lambda (x, y): x in y, zip(df['a'], df['b']))
%time df.apply(lambda x: x[0] in x[1], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With