I am wondering how to write a regex pattern to find strings in which any word in a list is not proceeded by another word:
To give context, imagine two lists of words:
Parts = ['spout', 'handle', 'base']
Objects = ['jar', 'bottle']
Imagine the following strings
string = 'Jar with spout and base'
string2 = 'spout of jar'
string3 = 'handle of jar'
string4 = 'base of bottle with one handle'
string5 = 'bottle base'
I want to write a rule so that if we have an expression like "spout of jar" or "handle of bottle" or "bottle base", I can output a statement like "object is fragment of jar, has part spout/base" into a dataframe but if we have an expression like "jar with spout", I can output an expression like "object is jug, has part spout".
Basically, I want to write a rule so that if any word in Parts is in the string, we write that the object is a fragment--unless the word is proceeded by 'with'.
So I wrote this, with negative lookbehind followed by .* followed by any word in Parts:
rf"(?!with)(.*)(?:{'|'.join(Part)})"
But this just does not seem to work: "jar with spout" will still match this pattern when I try it in Python.
So I am just not sure how to write a regex pattern to exclude any expression involving 'with' followed by any sequence of characters, followed by a word in Parts
Super grateful for any help that can be provided here!
You can easily write such a pattern for PyPi regex library (install with pip install regex):
(?<!\bwith\b.*?)\b(?:spout|handle|base)\b
See the regex demo. Details:
(?<!\bwith\b.*?) - immediately to the left of the current location, there should be no whole word with and any zero or more chars other than line break chars, as few as possible\b(?:spout|handle|base)\b - a whole word spout, handle, or base.See the Python demo:
import regex
Parts = ['spout', 'handle', 'base']
Objects = ['jar', 'bottle']
strings = ['Jar with spout and base','spout of jar','handle of jar','base of bottle with one handle','bottle base']
pattern = regex.compile(rf"(?<!\bwith\b.*?)\b(?:{'|'.join(Parts)})\b")
print( list(filter(pattern.search, strings)) )
# => ['spout of jar', 'handle of jar', 'base of bottle with one handle', 'bottle base']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With