How to use \b word boundary in pandas str.contains?

Tags:

Is there an equivalent when using str.contains?

the following code is mistakenly listing "Said Business School" in the category because of 'Sa.' If I could create a wordboundary it would solve the problem. Putting a space after messes this up. I am using pandas, which are the dfs. I know I can use regex, but just curious if i can use strings to make it faster

gprivate_n = ('Co|Inc|Llc|Group|Ltd|Corp|Plc|Sa |Insurance|Ag|As|Media|&|Corporation')
df.loc[df[df.Name.str.contains('{0}'.format(gprivate_n))].index, "Private"] = 1

622

asked Mar 12 '14 17:03

user3314418

2 Answers

This is just the same old Python issue in regexes where '\b' should be passed either as raw-string r'\b...'. Or less desirably, double-escaping ('\\b').

So your regex should be:

gprivate_n = (r'\b(Co|Inc|Llc|Group|Ltd|Corp|Plc|Sa |Insurance|Ag|As|Media|&|Corporation)')

146

answered Oct 19 '22 21:10

smci

A word boundary is not a character, so you can't find it with .contains. You need to either use regex or split the strings into words and then check for membership of each of those words in the set you currently have defined in gprivate_n.

answered Oct 19 '22 20:10

RexE

Related questions
                            
                                @property speed overhead in Python [duplicate]
                            
                                Floating point problems in asymptotic functions approaching zero - Python
                            
                                Calculate the greatest distance between any two strings in a group, using Python
                            
                                instagram.bind.InstagramClientError: Unable to parse response, not valid JSON
                            
                                Is os.path.expanduser("~/x") equivalent to os.path.abspath(os.path.expanduser("~/x"))?
                            
                                Is if(interactive()) an R equivalent to the pythonic “if __name__ == ”__main__“: main()”?
                            
                                Binary/Hex Floating Point Entry
                            
                                passing numpy arrays through multiprocessing.Queue
                            
                                Numba Matrix Vector multiplication
                            
                                Python, run test, send email if it fails
                            
                                Pagination in Google App EngineSearch API
                            
                                JIRA REST API and kerberos authentication
                            
                                Reusing pure Python functions between PL/Python functions
                            
                                C++ tools with the same functionality as Python's filter and map
                            
                                Python Pandas Frequency documentation
                            
                                Batch message from a rabbitMQ queue
                            
                                Django pre_save signal - would an exception fail the transaction?
                            
                                Why does zipfile.is_zipfile returns True on xlsx file?
                            
                                why does a call to locals() add a reference?
                            
                                python openCV debayer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use \b word boundary in pandas str.contains?

Tags:

python

string

regex

pandas

word-boundary

user3314418

People also ask

2 Answers

smci

RexE

Recent Activity

Donate For Us