Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all occurrences of a substring?

Python has string.find() and string.rfind() to get the index of a substring in a string.

I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"  print string.find('test') # 0 print string.rfind('test') # 15  #this is the goal print string.find_all('test') # [0,5,10,15] 
like image 868
nukl Avatar asked Jan 12 '11 02:01

nukl


People also ask

Which method finds the list of all occurrences of the pattern in the given?

finditer() To get all occurrences of a pattern in a given string, you can use the regular expression method re. finditer(pattern, string) . The result is an iterable of match objects—you can retrieve the indices of the match using the match.

How do you find the occurrence of a substring in a string C++?

Find indices of all occurrence of one string in other in C++ To solve this problem, we can use the substr() function in C++ STL. This function takes the initial position from where it will start checking, and the length of the substring, if that is the same as the sub_str, then returns the position.


2 Answers

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re [m.start() for m in re.finditer('test', 'test test test test')] #[0, 5, 10, 15] 

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')] #[0, 1] 

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt' [m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')] #[1] 

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

like image 181
moinudin Avatar answered Oct 01 '22 04:10

moinudin


>>> help(str.find) Help on method_descriptor:  find(...)     S.find(sub [,start [,end]]) -> int 

Thus, we can build it ourselves:

def find_all(a_str, sub):     start = 0     while True:         start = a_str.find(sub, start)         if start == -1: return         yield start         start += len(sub) # use start += 1 to find overlapping matches  list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15] 

No temporary strings or regexes required.

like image 31
Karl Knechtel Avatar answered Oct 01 '22 04:10

Karl Knechtel