Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for a nice way to split an array

I've been looking for a method similar to String.split in Scala Array, but I've not been able to find it.

What I want to do is to split an array by a separator.

For example, separating the following array:

val array = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')

using the '\n' separator, should result in:

List(Array(a, b), Array(c, d, e), Array(g))

I know that I can convert the Array to String, and apply split there:

array.mkString.split('\n').map(_.toArray)

but I would prefer to skip the conversion.

The solution I have so far involves using span recursively and is a bit too boilerplate:

  def splitArray[T](array: Array[T], separator: T): List[Array[T]] = {
    def spanRec(array: Array[T], aggResult: List[Array[T]]): List[Array[T]] = {
      val (firstElement, restOfArray) = array.span(_ != separator)
      if (firstElement.isEmpty) aggResult
      else spanRec(restOfArray.dropWhile(_ == separator), firstElement :: aggResult)
    }
    spanRec(array, List()).reverse
  }

I'm sure there must be something in Scala I'm missing. Any idea?

thanks, Ruben

like image 248
Ruben Avatar asked Jan 11 '13 12:01

Ruben


People also ask

What is a correct method to split array?

Use the array_split() method, pass in the array you want to split and the number of splits you want to do.

How do you split an array into two parts?

To divide an array into two, we need at least three array variables. We shall take an array with continuous numbers and then shall store the values of it into two different variables based on even and odd values.

How do you separate data in an array?

The easiest way to extract a chunk of an array, or rather, to slice it up, is the slice() method: slice(start, end) - Returns a part of the invoked array, between the start and end indices. Note: Both start and end can be negative integers, which just denotes that they're enumerated from the end of the array.


2 Answers

This is not the most concise implementation, but it should be fairly performed and preserves the array type without resorting to reflection. The loop can of course be replaced by a recursion.

Since your question doesn't explicitly state what should be done with the separator I assume, that they should not cause any entry in the output list (see the test cases below).

def splitArray[T](xs: Array[T], sep: T): List[Array[T]] = {
  var (res, i) = (List[Array[T]](), 0)

  while (i < xs.length) {    
    var j = xs.indexOf(sep, i)
    if (j == -1) j = xs.length
    if (j != i) res ::= xs.slice(i, j)
    i = j + 1
  }

  res.reverse
}

Some tests:

val res1 =
  // Notice the two consecutive '\n'
  splitArray(Array('a', 'b', '\n', 'c', 'd', 'e', '\n', '\n', 'g', '\n'), '\n')

println(res1)
  // List([C@12189646, [C@c31d6f2, [C@1c16b01f)
res1.foreach(ar => {ar foreach print; print(" ")})
  // ab cde g


// No separator
val res2 = splitArray(Array('a', 'b'), '\n')
println(res2)
  // List([C@3a2128d0)
res2.foreach(ar => {ar foreach print; print(" ")})
  // ab


// Only separators
val res3 = splitArray(Array('\n', '\n'), '\n')
println(res3)
  // List()
like image 96
Malte Schwerhoff Avatar answered Oct 18 '22 19:10

Malte Schwerhoff


I came up with a solution that aims at the following:

  • is generic: you should be able to split an Array just like a Vector, and a collection of Chars just like a collection of arbitrary objects
  • preserves the types of the inputs: an Array[A] gets split in an Array[Array[A]], a Vector[A] gets split in a Vector[Vector[A]]
  • allows to use a lazy approach if needed (via an Iterator)
  • exposes a compact interface for most cases (just call a split method on your collection)

Before getting to the explanation, note that you can play with the code that follows here on Scastie.

The first step is implementing an Iterator that chunks your collection:

import scala.language.higherKinds
import scala.collection.generic.CanBuildFrom

final class Split[A, CC[_]](delimiter: A => Boolean, as: CC[A])(
    implicit view: CC[A] => Seq[A], cbf: CanBuildFrom[Nothing, A, CC[A]])
    extends Iterator[CC[A]] {

  private[this] var it: Iterator[A] = view(as).iterator

  private def skipDelimiters() = {
    it = it.dropWhile(delimiter)
  }

  skipDelimiters()

  override def hasNext: Boolean = it.hasNext

  override def next(): CC[A] = {
    val builder = cbf()
    builder ++= it.takeWhile(!delimiter(_))
    skipDelimiters()
    builder.result()
  }

}

I'm using a predicate instead of a value to be more elastic in how the collection gets split, especially when splitting a collection of non-scalar values (like Chars).

I'm using an implicit view on the collection type to be able to apply this to all kinds of collection that can be seen as a Seq (like Vectors and Arrays) and a CanBuildFrom to be able to build the exact type of collection I'm receiving as an input.

The implementation of the Iterator simply makes sure to drop delimiters and chunk the rest.

We can now use an implicit class to offer a friendly interface and add the split method to all the collections, both allowing a predicate or a value to be defined as delimiters:

final implicit class Splittable[A, CC[_]](val as: CC[A])(implicit ev1: CC[A] => Seq[A], ev2: CanBuildFrom[Nothing, A, CC[A]], ev3: CanBuildFrom[Nothing, CC[A], CC[CC[A]]]) {

  def split(delimiter: A => Boolean): CC[CC[A]] = new Split(as)(delimiter).to[CC]

  def split(delimiter: A): CC[CC[A]] = new Split(as)(_ == delimiter).to[CC]

}

Now you can use your method freely on collection of Chars

val a = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
val b = List('\n', '\n', '\n')
val c = Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n')
val d = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
val e = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')

a.split('\n')
b.split('\n')
c.split('\n')
d.split('\n')
e.split('\n')

and arbitrary objects alike:

final case class N(n: Int, isDelimiter: Boolean)

Vector(N(1, false), N(2, false), N(3, true), N(4, false), N(5, false)).split(_.isDelimiter)

Note that by using the iterator directly you use a lazy approach, as you can see if you add a debug print to the next method and try to execute the following:

new Split(Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n'))(_ == '\n'}).take(1).foreach(println)

If you want, you can add a couple of methods to Splittable that return an Iterator, so that you can expose the lazy approach as well directly through it.

like image 38
stefanobaghino Avatar answered Oct 18 '22 20:10

stefanobaghino