Regex match back to a period or start of string

Tags:

regex

I'd like to match a word, then get everything before it up to the first occurance of a period or the start of the string.

For example, given this string and searching for the word "regex":

Click to copy

s = 'Do not match this. Or this. Or this either. I like regex. It is hard, but regex is also rewarding.'

It should return:

Click to copy

>> I like regex.
>> It is hard, but regex is also rewarding.

I'm trying to get my head around look-aheads and look-behinds, but (it seems) you can't easily look back until you hit something, only if it's immediately next to your pattern. I can get pretty close with this:

Click to copy

pattern = re.compile(r'(?:(?<=\.)|(?<=^))(.*?regex.*?\.)')

But it gives me the first period, then everything up to "regex":

Click to copy

>> Do not match this. Or this. Or this either. I like regex.  # no!
>> It is hard, but regex is also rewarding.                   # correct

909

asked Jul 20 '17 00:07

JeffThompson

1 Answers

You don't need to use lookarounds to do that. The negated character class is your best friend:

Click to copy

(?:[^\s.][^.]*)?regex[^.]*\.?

Click to copy

[^.]*regex[^.]*\.?

this way you take any characters before the word "regex" and forbids any of these characters to be a dot.

The first pattern stripes white-spaces on the left, the second one is more basic.

About your pattern:

Don't forget that a regex engine tries to succeed at each position from the left to the right of the string. That's why something like (?:(?<=\.)|(?<=^)).*?regex doesn't always return the shortest substring between a dot or the start of the string and the word "regex", even if you use a non-greedy quantifier. The leftmost position always wins and a non-greedy quantifier takes characters until the next subpattern succeeds.

As an aside, one more time, the negated character class can be useful:
to shorten (?:(?<=\.)|(?<=^)) you can write (?<![^.])

152

answered Oct 20 '22 10:10

Casimir et Hippolyte

Related questions
                            
                                TkInter Frame doesn't load if another function is called
                            
                                What exactly is the variance on the parameters of SciPy curve fit? (Python)
                            
                                Checking to see if Gtk mainloop is running
                            
                                Python: Requests Proxies not working
                            
                                creating new columns in a data set based on values of a column using Regex
                            
                                seaborn boxplot x-axis as numbers, not labels
                            
                                Anaconda - Spyder is very slow to start on Windows 8 (checking for updates?)
                            
                                Find every two (non-overlapping) vowels inbetween consonants
                            
                                Plotly (Dash) tick label overwriting
                            
                                Why isn't urls.py generated with django-admin startapp mysite?
                            
                                Unique random number sampling with Numpy
                            
                                Difference between axis('equal') and axis('scaled') in matplotlib
                            
                                sphinx: Including .tex file via raw:: latex
                            
                                Why is vectorized numpy code slower than for loops?
                            
                                How can you compare two cluster groupings in terms of similarity or overlap in Python?
                            
                                pandas: operations using groupby yield SettingWithCopyWarning
                            
                                Python 2.7 Converting Bitcoin Privkey into WIF Privkey
                            
                                One horizontal colorbar for seaborn heatmaps subplots and Annot Issue with xticklabels
                            
                                Python: how to randomly sample from nonstandard Cauchy distribution, hence with different parameters?
                            
                                Gensim Doc2Vec generating huge file for model [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex match back to a period or start of string

Tags:

python

regex

JeffThompson

People also ask

1 Answers

Casimir et Hippolyte

Recent Activity

Donate For Us