Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indices of a substring in Smalltalk

It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants.

There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions.

My goal is to find a package or method which provides the following behavior:

'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"

When overlapping is considered:

'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"

When overlapping is not considered:

'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"

In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. However I couldn't find a String implementation of this.

My best effort so far results in this implementation in String (Pharo 6):

indicesOfSubstring: subString
  | indices i |

  indices := OrderedCollection new: self size.
  i := 0.
  [ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
    indices addLast: i ].
  ^ indices
like image 371
user1000565 Avatar asked Jul 04 '18 17:07

user1000565


2 Answers

Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. Therefore your examples should read

'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"

Note that I've also taken notice of @lurker's observation (and have tweaked the selector too).

Now, starting from your code I would change it as follows:

indexesOfSubstring: subString overlapping: aBoolean
  | n indexes i |
  n := subString size.
  indexes := OrderedCollection new.                            "removed the size"
  i := 1.                                                      "1-based"
  [
    i := self findString: subString startingAt: i.             "split condition"
    i > 0]
  whileTrue: [
    indexes add: i.                                            "add: = addLast:"
    i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]].           "new!"
  ^indexes

Make sure you write some few unit tests (and don't forget to exercise the border cases!)

like image 94
Leandro Caniglia Avatar answered Sep 29 '22 05:09

Leandro Caniglia


Edited

It would also be nice if you would tell us what you need to achieve in the "greater picture". Sometimes Smalltalk offers different approaches.

Leandro beat me to the the code (and his code is more efficient), but I have already written it so I'll share it too. Heed his advice on Smalltalk being 1-based => rewritten example.

I have used Smalltalk/X and Pharo 6.1 for the example.

The code would be:

indexesOfSubstring: substringToFind overlapping: aBoolean

    | substringPositions aPosition currentPosition |

    substringPositions := OrderedSet new. "with overlap on you could get multiple same 
              positions in the result when there is more to find in the source string"

    substringToFindSize := substringToFind size. "speed up for large strings"
    aPosition := 1.

    [ self size > aPosition ] whileTrue: [
        currentPosition := self findString: substringToFind startingAt: aPosition.
        (currentPosition = 0) ifTrue: [ aPosition := self size + 1 ] "ends the loop substringToFind is not found"
                             ifFalse: [
                                 substringPositions add: currentPosition.
                                 aBoolean ifTrue: [ aPosition := aPosition + 1 ] "overlapping is on"
                                         ifFalse: [ aPosition := currentPosition + substringToFindSize ] "overlapping is off"
                             ]
    ].

    ^ substringPositions

I have fixed some issues that occured to me. Don't forget to test it as much as you can!

like image 27
tukan Avatar answered Sep 29 '22 04:09

tukan