It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants.
There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions.
My goal is to find a package or method which provides the following behavior:
'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"
When overlapping is considered:
'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"
When overlapping is not considered:
'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"
In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. However I couldn't find a String implementation of this.
My best effort so far results in this implementation in String (Pharo 6):
indicesOfSubstring: subString
| indices i |
indices := OrderedCollection new: self size.
i := 0.
[ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
indices addLast: i ].
^ indices
Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. Therefore your examples should read
'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"
Note that I've also taken notice of @lurker's observation (and have tweaked the selector too).
Now, starting from your code I would change it as follows:
indexesOfSubstring: subString overlapping: aBoolean
| n indexes i |
n := subString size.
indexes := OrderedCollection new. "removed the size"
i := 1. "1-based"
[
i := self findString: subString startingAt: i. "split condition"
i > 0]
whileTrue: [
indexes add: i. "add: = addLast:"
i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]]. "new!"
^indexes
Make sure you write some few unit tests (and don't forget to exercise the border cases!)
Edited
It would also be nice if you would tell us what you need to achieve in the "greater picture". Sometimes Smalltalk offers different approaches.
Leandro beat me to the the code (and his code is more efficient), but I have already written it so I'll share it too. Heed his advice on Smalltalk being 1-based => rewritten example.
I have used Smalltalk/X and Pharo 6.1 for the example.
The code would be:
indexesOfSubstring: substringToFind overlapping: aBoolean
| substringPositions aPosition currentPosition |
substringPositions := OrderedSet new. "with overlap on you could get multiple same
positions in the result when there is more to find in the source string"
substringToFindSize := substringToFind size. "speed up for large strings"
aPosition := 1.
[ self size > aPosition ] whileTrue: [
currentPosition := self findString: substringToFind startingAt: aPosition.
(currentPosition = 0) ifTrue: [ aPosition := self size + 1 ] "ends the loop substringToFind is not found"
ifFalse: [
substringPositions add: currentPosition.
aBoolean ifTrue: [ aPosition := aPosition + 1 ] "overlapping is on"
ifFalse: [ aPosition := currentPosition + substringToFindSize ] "overlapping is off"
]
].
^ substringPositions
I have fixed some issues that occured to me. Don't forget to test it as much as you can!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With