Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Containment in Pandas

I am trying to produce all the rows where company1 in df is contained in company2. I am doing it as follows:

df1=df[['company1','company2']][(df.apply(lambda x: x['company1'] in x['company2'], axis=1) == True)]

When I run the above line of code, it also shows "South" matched with "Southern". Also, "South" matched with "Route South". I want to get rid of all such cases. Company1 should only be contained in beginning of Company2. And, company1 should not be a part of some word in company2 like "South" (company1) matched with "Southern" (company2). How should I modify my code to accomplish above two requirements?

like image 301
ComplexData Avatar asked Jan 26 '26 10:01

ComplexData


1 Answers

I think you need:

df = pd.DataFrame({'company1': {0: 'South', 1: 'South', 2:'South'}, 
                   'company2': {0: 'Southern', 1: 'Route South', 2: 'South Route'}})

print (df)
  company1     company2
0    South     Southern
1    South  Route South
2    South  South Route

df1=df[df['company2'].str.contains("|".join('^' + df['company1'] + ' '))]
print (df1)
  company1     company2
2    South  South Route
like image 137
jezrael Avatar answered Jan 28 '26 00:01

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!