Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split String in Scala but keep the part matching the regular expression?

Tags:

regex

scala

My question is the same as Split string including regular expression match but for Scala. Unfortunately, the JavaScript solution doesn't work in Scala.

I am parsing some text. Let's say I have some string:

"hello wold <1> this is some random text <3> foo <12>"

I would like to get the following Seq: "hello world" :: "<1>" :: "this is some random text" :: "<3>" :: "foo" :: "<12>".

Note that I am spliting the string whenever I encounter a <"number"> sequence.

like image 905
Noel Yap Avatar asked Nov 13 '13 23:11

Noel Yap


2 Answers

val s = "hello wold <1> this is some random text <3> foo <12>"
s: java.lang.String = hello wold <1> this is some random text <3> foo <12>

s.split("""((?=<\d{1,3}>)|(?<=<\d{1,3}>))""")
res0: Array[java.lang.String] = Array(hello wold , <1>,  this is some random text , <3>,  foo , <12>)

Did you actually try out your edit? Having \d+ doesn't work. See this question.

s.split("""((?=<\d+>)|(?<=<\d+>))""")
java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 19
like image 86
Akos Krivachy Avatar answered Nov 15 '22 08:11

Akos Krivachy


Here's a quick, but a little hacky solution:

scala> val str = "hello wold <1> this is some random text <3> foo <12>"
str: String = hello wold <1> this is some random text <3> foo <12>

scala> str.replaceAll("<\\d+>", "_$0_").split("_")
res0: Array[String] = Array("hello wold ", <1>, " this is some random text ", <3>, " foo ", <12>)

Of course, the problem with this solution is that I gave the underscore character a special meaning. If it occurs naturally in the original string, you'll get bad results. So you have to either choose another magic character sequence for which you are sure that it won't occur in the original string or play with some more escaping/unescaping.

Another solution involves usage of lookahead and lookbehind patterns, as described in this question.

like image 44
ghik Avatar answered Nov 15 '22 09:11

ghik