Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - Match n occurences of substring within any m-lettered window

Tags:

regex

I am facing some issues forming a regex that matches at least n times a given pattern within m characters of the input string. For example imagine that my input string is:

00000001100000001110111100000000000000000000000000000000000000000000000000110000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100

I want to detect all cases where an 1 appears at least 7 times (not necessarily consecutively) in the input string, but within a window of up to 20 characters.

So far I have built this expression:

(1[^1]*?){7,}

which detects all cases where an 1 appears at least 7 times in the input string, but this now matches both the:

11000000011101111

and the

1100000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011

parts whereas I want only the first one to be kept, as it is within a substring composed of less than 20 characters.

It tried to combine the aforementioned regex with:

(?=(^[01]{0,20}))

to also match only parts of the string containing either an '1' or a '0' of length up to 20 characters but when I do that it stops working.

Does anyone have an idea gow to accomplish this? I have put this example in regex101 as a quick reference.

Thank you very much!

like image 386
dimly Avatar asked Dec 03 '25 18:12

dimly


1 Answers

This is not something that can be done with regex without listing out every possible string. You would need to iterate over the string instead.

You could also iterate over the matches. Example in Python:

import re
matches = re.finditer(r'(?=((1[^1]*?){7}))', string)
matches = [match.group(1) for match in matches if len(match.group(1)) <= 20]
like image 148
Anonymous Avatar answered Dec 06 '25 09:12

Anonymous



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!