Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex subsequence matching

Tags:

python

regex

I'm using python but code in any language will do as well for this question.

Suppose I have 2 strings.

sequence ='abcd'
string = 'axyzbdclkd'

In the above example sequence is a subsequence of string

How can I check if sequence is a subsequence of string using regex? Also check the examples here for difference in subsequence and subarray and what I mean by subsequence.

The only think I could think of is this but it's far from what I want.

import re
c = re.compile('abcd')
c.match('axyzbdclkd')
like image 290
Abhishek Jebaraj Avatar asked Mar 02 '17 12:03

Abhishek Jebaraj


2 Answers

Just allow arbitrary strings in between:

c = re.compile('.*a.*b.*c.*d.*')
# .* any character, zero or more times
like image 51
user2390182 Avatar answered Nov 09 '22 23:11

user2390182


You can, for an arbitrary sequence construct a regex like:

import re

sequence = 'abcd'
rgx = re.compile('.*'.join(re.escape(x) for x in sequence))

which will - for 'abcd' result in a regex 'a.*b.*c.*d'. You can then use re.find(..):

the_string = 'axyzbdclkd'
if rgx.search(the_string):
    # ... the sequence is a subsequence.
    pass

By using re.escape(..) you know for sure that for instance '.' in the original sequence will be translated to '\.' and thus not match any character.

like image 33
Willem Van Onsem Avatar answered Nov 10 '22 01:11

Willem Van Onsem