Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a String and get each segment's start index

I'm trying to split a String and get all the start indexes of each "word" that I get.

For example for such a String:

"Rabbit jumped over a fence and this Rabbit loves carrots"

How can I split it to get each word's index?:

0,7,14,19,21,27,31,36,43,49
like image 515
JoeDortman Avatar asked Dec 17 '22 22:12

JoeDortman


2 Answers

You can do like this

val str="Rabbit jumped over a fence and this Rabbit loves carrots"
val indexArr=str.split(" ").scanLeft(0)((prev,next)=>prev+next.length+1).dropRight(1)

Sample Output:

ndexArr: Array[Int] = Array(0, 7, 14, 19, 21, 27, 31, 36, 43, 49)
like image 121
Manoj Kumar Dhakad Avatar answered Dec 20 '22 10:12

Manoj Kumar Dhakad


Here is a solution that works even if delimiters are not constant in width (not only for delimiters with length 1).

  1. Instead of a single delimiter FOO, use combination of look-ahead and look-behind (?<=FOO)|(?=FOO).
  2. Scan over the lengths of tokens and delimiters, accumulate their lengths to obtain start indices
  3. Throw away every second entry (the delimiters)

In code:

val txt = "Rabbit jumped over a fence and this Rabbit loves carrots"
val pieces = txt.split("(?= )|(?<= )")
val startIndices = pieces.scanLeft(0){ (acc, w) => acc + w.size }
val tokensWithStartIndices = (pieces zip startIndices).grouped(2).map(_.head)

tokensWithStartIndices foreach println

Result:

(Rabbit,0)
(jumped,7)
(over,14)
(a,19)
(fence,21)
(and,27)
(this,31)
(Rabbit,36)
(loves,43)
(carrots,49)

Here are some intermediate outputs, so you can better understand what's going on in each step:

scala> val txt = "Rabbit jumped over a fence and this Rabbit loves carrots"
txt: String = Rabbit jumped over a fence and this Rabbit loves carrots

scala> val pieces = txt.split("(?= )|(?<= )")
pieces: Array[String] = Array(Rabbit, " ", jumped, " ", over, " ", a, " ", fence, " ", and, " ", this, " ", Rabbit, " ", loves, " ", carrots)

scala> val startIndices = pieces.scanLeft(0){ (acc, w) => acc + w.size }
startIndices: Array[Int] = Array(0, 6, 7, 13, 14, 18, 19, 20, 21, 26, 27, 30, 31, 35, 36, 42, 43, 48, 49, 56)
like image 22
Andrey Tyukin Avatar answered Dec 20 '22 11:12

Andrey Tyukin