Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Find a string between two strings, repeatedly

I'm new to Python and still learning about regular expressions, so this question may sound trivial to some regex expert, but here you go. I suppose my question is a generalization of this question about finding a string between two strings. I wonder: what if this pattern (initial_substring + substring_to_find + end_substring) is repeated many times in a long string? For example

test='someth1 var="this" someth2 var="that" '
result= re.search('var=(.*) ', test)
print result.group(1)
>>> "this" someth2 var="that"

Instead, I'd like to get a list like ["this","that"]. How can I do it?

like image 428
Nonancourt Avatar asked Feb 17 '17 16:02

Nonancourt


People also ask

How do you find the string between two strings in Python?

Using index() + loop to extract string between two substrings. In this, we get the indices of both the substrings using index(), then a loop is used to iterate within the index to find the required string between them.

How do I extract a string between two delimiters in Python?

Extract substring between two markers using split() method Next method that we will be using is the split() method of Python Programming language, to extract a given substring between two markers. The split() method in python splits the given string from a given separator and returns a list of splited substrings.

How do I extract text between parentheses in Python?

The simplest way to extract the string between two parentheses is to use slicing and string. find() . First, find the indices of the first occurrences of the opening and closing parentheses. Second, use them as slice indices to get the substring between those indices like so: s[s.


2 Answers

Use re.findall():

result = re.findall(r'var="(.*?)"', test)
print(result)  # ['this', 'that']

If the test string contains multiple lines, use the re.DOTALL flag.

re.findall(r'var="(.*?)"', test, re.DOTALL)
like image 187
zwer Avatar answered Oct 21 '22 06:10

zwer


The problem with your current regex is that the capture group (.*) is an extremely greedy statement. After the first instance of a var= in your string, that capture group will get everything after it.

If you instead decrease the generalization of the expression to var="(\w+)", you will not have the same issue, therefore changing that line of python to:

result = re.findall(r'var="([\w\s]+)"', test)
like image 29
m_callens Avatar answered Oct 21 '22 06:10

m_callens