Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: how to split using more than one delimiter

Tags:

list

split

scala

I would like to know how I can split a string using more than one delimiter with Scala.

For instance if I have a list of delimiters :

List("Car", "Red", "Boo", "Foo")

And a string to harvest :

Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed

I would like to be able to output something like :

List(   ("Car", " foerjfpoekrfopekf "),
    ("Red", " ezokdpzkdpoedkzopke dekpzodk "),
    ("Foo", " azdkpodkzed")     
)
like image 266
Roch Avatar asked Oct 12 '12 09:10

Roch


People also ask

How do you split a string with two delimiters?

Using String. split() Method. The split() method of the String class is used to split a string into an array of String objects based on the specified delimiter that matches the regular expression.

Can Split have multiple separators?

Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.

How do I split multiple strings?

String split() Method: The str. split() function is used to split the given string into array of strings by separating it into substrings using a specified separator provided in the argument.


2 Answers

You can use the list to create a regular expression and use its split method:

val regex = List("Car", "Red", "Boo", "Foo").mkString("|").r
regex.split("Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed")

That however doesn't tell you which delimiter was used where. If you need that, I suggest you try Scala's parser library.

EDIT:

Or you can use regular expressions to extract one pair at a time like this:

def split(s:String, l:List[String]):List[(String,String)] = {
  val delimRegex = l.mkString("|")
  val r = "("+delimRegex+")(.*?)(("+delimRegex+").*)?"
  val R = r.r
  s match {
    case R(delim, text, rest, _) => (delim, text) :: split(rest, l)
    case _ => Nil
  }
}
like image 68
Kim Stebel Avatar answered Oct 18 '22 01:10

Kim Stebel


a bit verbose, but it works:
DEPRECATED VERSION: (it has a bug, left it here because you already accepted the answer)

def f(s: String, l: List[String], g: (String, List[String]) => Int) = {
    for {
        t <- l
        if (s.contains(t))
        w = s.drop(s.indexOf(t) + t.length)
    } yield (t, w.dropRight(w.length - g(w, l)))
}

def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length

def g(s: String, l: List[String]): Int = l match {
    case Nil => s.length
    case x :: xs => math.min(h(s, x), g(s, xs))
}

val l = List("Car", "Red", "Boo", "Foo")

val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed"

output:

f(s, l, g).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)

it returns Array[String] instead of list. but you can just as well do: f(s, l, g).toList

EDIT: just noticed this code is good if the delimiters only appear once in the string. if had defined s as follows:

val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed Car more..."

I'd still get the same result, instead of another pair ("Car"," more...")

EDIT#2: BUGLESS VERSION here's the fixed snippet:

def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length

def multiSplit(str: String, delimiters: List[String]): List[(String, String)] = {
    val del = nextDelimiter(str, delimiters)
    del._1 match {
        case None => Nil
        case Some(x) => {
            val tmp = str.drop(x.length)
            val current = tmp.dropRight(tmp.length - nextDelIndex(tmp,delimiters))
            (x, current) :: multiSplit(str.drop(x.length + current.length), delimiters)
        }
    }
}

def nextDelIndex(s: String, l: List[String]): Int = l match {
    case Nil => s.length
    case x :: xs => math.min(h(s, x), nextDelIndex(s, xs))
}

def nextDelimiter(str: String, delimiters: List[String]): (Option[String], Int) = delimiters match {
    case Nil => (None, -1)
    case x :: xs => {
        val next = nextDelimiter(str, xs)
        if (str.contains(x)) {
            val i = str.indexOf(x)
            next._1 match {
                case None => (Some(x), i)
                case _ => if (next._2 < i) next else (Some(x), i)
            }
        } else next
    }
}

output:

multiSplit(s, l).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
> (Car, more...)

and now it works :)

like image 1
gilad hoch Avatar answered Oct 18 '22 00:10

gilad hoch