Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find String Between Two Substrings in Python When There is A Space After First Substring

While there are several posts on StackOverflow that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.

I have the following string (example_string): <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

I want to extract "I want this string." from the string above. The randomletters will always change, however the quote "I want this string." will always be between [?] (with a space after the last square bracket) and Reduced.

Right now, I can do the following to extract "I want this string".

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

This eliminates the ] and that always appear at the start of my extracted string, thus only printing "I want this string." However, this solution seems ugly, and I'd rather make re.search() return the current target string without any modification. How can I do this?

like image 722
Roymunson Avatar asked Mar 31 '18 19:03

Roymunson


Video Answer


4 Answers

Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

Use

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

See the regex demo.

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

See the Python demo.

like image 87
Wiktor Stribiżew Avatar answered Oct 09 '22 03:10

Wiktor Stribiżew


Regex may not be necessary for this, provided your string is in a consistent format:

mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

res = mystr.split('Reduced')[0].split('] ')[1]

# 'I want this string.'
like image 42
jpp Avatar answered Oct 09 '22 01:10

jpp


The solution turned out to be:

target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)

However, Wiktor's solution is better.

like image 1
Roymunson Avatar answered Oct 09 '22 02:10

Roymunson


You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\]) :

enter image description here

import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'

string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

print(re.findall(pattern,string_data)[0].strip())

output:

I want this string.
like image 1
Aaditya Ura Avatar answered Oct 09 '22 03:10

Aaditya Ura