Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract the substring between two markers?

Let's say I have a string 'gfgfdAAA1234ZZZuijjk' and I want to extract just the '1234' part.

I only know what will be the few characters directly before AAA, and after ZZZ the part I am interested in 1234.

With sed it is possible to do something like this with a string:

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|" 

And this will give me 1234 as a result.

How to do the same thing in Python?

like image 821
ria Avatar asked Jan 12 '11 09:01

ria


People also ask

How do I extract a string between two substrings in Python?

To find a string between two strings in Python, use the re.search() method. The re.search() is a built-in Python method that searches a string for a match and returns the Match object if it finds a match. If it finds more than one match, it only returns the first occurrence of the match.

How do you split a string between two words in Python?

The Python standard library comes with a function for splitting strings: the split() function. This function can be used to split strings between characters. The split() function takes two parameters. The first is called the separator and it determines which character is used to split the string.

How do I pull data between two characters in Python?

Given a string and two substrings, write a Python program to extract the string between the found two substrings. In this, we get the indices of both the substrings using index(), then a loop is used to iterate within the index to find the required string between them.


2 Answers

Using regular expressions - documentation for further reference

import re  text = 'gfgfdAAA1234ZZZuijjk'  m = re.search('AAA(.+?)ZZZ', text) if m:     found = m.group(1)  # found: 1234 

or:

import re  text = 'gfgfdAAA1234ZZZuijjk'  try:     found = re.search('AAA(.+?)ZZZ', text).group(1) except AttributeError:     # AAA, ZZZ not found in the original string     found = '' # apply your error handling  # found: 1234 
like image 119
eumiro Avatar answered Sep 20 '22 05:09

eumiro


>>> s = 'gfgfdAAA1234ZZZuijjk' >>> start = s.find('AAA') + 3 >>> end = s.find('ZZZ', start) >>> s[start:end] '1234' 

Then you can use regexps with the re module as well, if you want, but that's not necessary in your case.

like image 20
Lennart Regebro Avatar answered Sep 20 '22 05:09

Lennart Regebro