Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find a word not proceeded by another word

Tags:

python

regex

I am wondering how to write a regex pattern to find strings in which any word in a list is not proceeded by another word:

To give context, imagine two lists of words:

Parts = ['spout', 'handle', 'base']
Objects = ['jar', 'bottle']

Imagine the following strings

string = 'Jar with spout and base'
string2 = 'spout of jar'
string3 = 'handle of jar'
string4 = 'base of bottle with one handle' 
string5 = 'bottle base'

I want to write a rule so that if we have an expression like "spout of jar" or "handle of bottle" or "bottle base", I can output a statement like "object is fragment of jar, has part spout/base" into a dataframe but if we have an expression like "jar with spout", I can output an expression like "object is jug, has part spout".

Basically, I want to write a rule so that if any word in Parts is in the string, we write that the object is a fragment--unless the word is proceeded by 'with'.

So I wrote this, with negative lookbehind followed by .* followed by any word in Parts:

rf"(?!with)(.*)(?:{'|'.join(Part)})"

But this just does not seem to work: "jar with spout" will still match this pattern when I try it in Python.

So I am just not sure how to write a regex pattern to exclude any expression involving 'with' followed by any sequence of characters, followed by a word in Parts

Super grateful for any help that can be provided here!

like image 212
kylemaxim Avatar asked Nov 19 '25 10:11

kylemaxim


1 Answers

You can easily write such a pattern for PyPi regex library (install with pip install regex):

(?<!\bwith\b.*?)\b(?:spout|handle|base)\b

See the regex demo. Details:

  • (?<!\bwith\b.*?) - immediately to the left of the current location, there should be no whole word with and any zero or more chars other than line break chars, as few as possible
  • \b(?:spout|handle|base)\b - a whole word spout, handle, or base.

See the Python demo:

import regex
Parts = ['spout', 'handle', 'base']
Objects = ['jar', 'bottle']
strings = ['Jar with spout and base','spout of jar','handle of jar','base of bottle with one handle','bottle base']
pattern = regex.compile(rf"(?<!\bwith\b.*?)\b(?:{'|'.join(Parts)})\b")
print( list(filter(pattern.search, strings)) )
# => ['spout of jar', 'handle of jar', 'base of bottle with one handle', 'bottle base']
like image 164
Wiktor Stribiżew Avatar answered Nov 21 '25 00:11

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!