Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cut within a pattern using Python regex

Objective: I am trying to perform a cut in Python RegEx where split doesn't quite do what I want. I need to cut within a pattern, but between characters.

What I am looking for:

I need to recognize the pattern below in a string, and split the string at the location of the pipe. The pipe isn't actually in the string, it just shows where I want to split.

Pattern: CDE|FG

String: ABCDEFGHIJKLMNOCDEFGZYPE

Results: ['ABCDE', 'FGHIJKLMNOCDE', 'FGZYPE']

What I have tried:

I seems like using split with parenthesis is close, but it doesn't keep the search pattern attached to the results like I need it to.

re.split('CDE()FG', 'ABCDEFGHIJKLMNOCDEFGZYPE')

Gives,

['AB', 'HIJKLMNO', 'ZYPE']

When I actually need,

['ABCDE', 'FGHIJKLMNOCDE', 'FGZYPE']

Motivation:

Practicing with RegEx, and wanted to see if I could use RegEx to make a script that would predict the fragments of a protein digestion using specific proteases.

like image 470
Michael Molter Avatar asked Jun 20 '16 17:06

Michael Molter


1 Answers

A non regex way would be to replace the pattern with the piped value and then split.

>>> pattern = 'CDE|FG'
>>> s = 'ABCDEFGHIJKLMNOCDEFGZYPE'
>>> s.replace('CDEFG',pattern).split('|')
['ABCDE', 'FGHIJKLMNOCDE', 'FGZYPE']
like image 183
Bhargav Rao Avatar answered Sep 29 '22 19:09

Bhargav Rao