I've been looking for a method similar to String.split in Scala Array, but I've not been able to find it.
What I want to do is to split an array by a separator.
For example, separating the following array:
val array = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
using the '\n'
separator, should result in:
List(Array(a, b), Array(c, d, e), Array(g))
I know that I can convert the Array to String, and apply split there:
array.mkString.split('\n').map(_.toArray)
but I would prefer to skip the conversion.
The solution I have so far involves using span recursively and is a bit too boilerplate:
def splitArray[T](array: Array[T], separator: T): List[Array[T]] = {
def spanRec(array: Array[T], aggResult: List[Array[T]]): List[Array[T]] = {
val (firstElement, restOfArray) = array.span(_ != separator)
if (firstElement.isEmpty) aggResult
else spanRec(restOfArray.dropWhile(_ == separator), firstElement :: aggResult)
}
spanRec(array, List()).reverse
}
I'm sure there must be something in Scala I'm missing. Any idea?
thanks, Ruben
Use the array_split() method, pass in the array you want to split and the number of splits you want to do.
To divide an array into two, we need at least three array variables. We shall take an array with continuous numbers and then shall store the values of it into two different variables based on even and odd values.
The easiest way to extract a chunk of an array, or rather, to slice it up, is the slice() method: slice(start, end) - Returns a part of the invoked array, between the start and end indices. Note: Both start and end can be negative integers, which just denotes that they're enumerated from the end of the array.
This is not the most concise implementation, but it should be fairly performed and preserves the array type without resorting to reflection. The loop can of course be replaced by a recursion.
Since your question doesn't explicitly state what should be done with the separator I assume, that they should not cause any entry in the output list (see the test cases below).
def splitArray[T](xs: Array[T], sep: T): List[Array[T]] = {
var (res, i) = (List[Array[T]](), 0)
while (i < xs.length) {
var j = xs.indexOf(sep, i)
if (j == -1) j = xs.length
if (j != i) res ::= xs.slice(i, j)
i = j + 1
}
res.reverse
}
Some tests:
val res1 =
// Notice the two consecutive '\n'
splitArray(Array('a', 'b', '\n', 'c', 'd', 'e', '\n', '\n', 'g', '\n'), '\n')
println(res1)
// List([C@12189646, [C@c31d6f2, [C@1c16b01f)
res1.foreach(ar => {ar foreach print; print(" ")})
// ab cde g
// No separator
val res2 = splitArray(Array('a', 'b'), '\n')
println(res2)
// List([C@3a2128d0)
res2.foreach(ar => {ar foreach print; print(" ")})
// ab
// Only separators
val res3 = splitArray(Array('\n', '\n'), '\n')
println(res3)
// List()
I came up with a solution that aims at the following:
Array
just like a Vector
, and a collection of Char
s just like a collection of arbitrary objectsArray[A]
gets split in an Array[Array[A]]
, a Vector[A]
gets split in a Vector[Vector[A]]
Iterator
)split
method on your collection)Before getting to the explanation, note that you can play with the code that follows here on Scastie.
The first step is implementing an Iterator
that chunks your collection:
import scala.language.higherKinds
import scala.collection.generic.CanBuildFrom
final class Split[A, CC[_]](delimiter: A => Boolean, as: CC[A])(
implicit view: CC[A] => Seq[A], cbf: CanBuildFrom[Nothing, A, CC[A]])
extends Iterator[CC[A]] {
private[this] var it: Iterator[A] = view(as).iterator
private def skipDelimiters() = {
it = it.dropWhile(delimiter)
}
skipDelimiters()
override def hasNext: Boolean = it.hasNext
override def next(): CC[A] = {
val builder = cbf()
builder ++= it.takeWhile(!delimiter(_))
skipDelimiters()
builder.result()
}
}
I'm using a predicate instead of a value to be more elastic in how the collection gets split, especially when splitting a collection of non-scalar values (like Char
s).
I'm using an implicit view on the collection type to be able to apply this to all kinds of collection that can be seen as a Seq
(like Vector
s and Array
s) and a CanBuildFrom
to be able to build the exact type of collection I'm receiving as an input.
The implementation of the Iterator
simply makes sure to drop delimiters and chunk the rest.
We can now use an implicit class
to offer a friendly interface and add the split
method to all the collections, both allowing a predicate or a value to be defined as delimiters:
final implicit class Splittable[A, CC[_]](val as: CC[A])(implicit ev1: CC[A] => Seq[A], ev2: CanBuildFrom[Nothing, A, CC[A]], ev3: CanBuildFrom[Nothing, CC[A], CC[CC[A]]]) {
def split(delimiter: A => Boolean): CC[CC[A]] = new Split(as)(delimiter).to[CC]
def split(delimiter: A): CC[CC[A]] = new Split(as)(_ == delimiter).to[CC]
}
Now you can use your method freely on collection of Char
s
val a = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
val b = List('\n', '\n', '\n')
val c = Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n')
val d = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
val e = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
a.split('\n')
b.split('\n')
c.split('\n')
d.split('\n')
e.split('\n')
and arbitrary objects alike:
final case class N(n: Int, isDelimiter: Boolean)
Vector(N(1, false), N(2, false), N(3, true), N(4, false), N(5, false)).split(_.isDelimiter)
Note that by using the iterator directly you use a lazy approach, as you can see if you add a debug print to the next
method and try to execute the following:
new Split(Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n'))(_ == '\n'}).take(1).foreach(println)
If you want, you can add a couple of methods to Splittable
that return an Iterator
, so that you can expose the lazy approach as well directly through it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With