Let's say I have a string 'gfgfdAAA1234ZZZuijjk'
and I want to extract just the '1234'
part.
I only know what will be the few characters directly before AAA
, and after ZZZ
the part I am interested in 1234
.
With sed
it is possible to do something like this with a string:
echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"
And this will give me 1234
as a result.
How to do the same thing in Python?
To find a string between two strings in Python, use the re.search() method. The re.search() is a built-in Python method that searches a string for a match and returns the Match object if it finds a match. If it finds more than one match, it only returns the first occurrence of the match.
The Python standard library comes with a function for splitting strings: the split() function. This function can be used to split strings between characters. The split() function takes two parameters. The first is called the separator and it determines which character is used to split the string.
Given a string and two substrings, write a Python program to extract the string between the found two substrings. In this, we get the indices of both the substrings using index(), then a loop is used to iterate within the index to find the required string between them.
Using regular expressions - documentation for further reference
import re text = 'gfgfdAAA1234ZZZuijjk' m = re.search('AAA(.+?)ZZZ', text) if m: found = m.group(1) # found: 1234
or:
import re text = 'gfgfdAAA1234ZZZuijjk' try: found = re.search('AAA(.+?)ZZZ', text).group(1) except AttributeError: # AAA, ZZZ not found in the original string found = '' # apply your error handling # found: 1234
>>> s = 'gfgfdAAA1234ZZZuijjk' >>> start = s.find('AAA') + 3 >>> end = s.find('ZZZ', start) >>> s[start:end] '1234'
Then you can use regexps with the re module as well, if you want, but that's not necessary in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With