Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split strings with separator splited into each characters in Scala

I'm trying to split a text bunch with | bar separator. 123.123.123.123|000.000.000.000 to each ip address blocks. But each numbers are splited not by |.

scala> "123.123.123.123|000.000.000.000".split("|")
res30: Array[java.lang.String] = Array("", 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, |, 0, 0, 0, ., 0, 0, 0, ., 0, 0, 0, ., 0, 0, 0)

scala> "123.123.123.123".split("|")
res33: Array[java.lang.String] = Array("", 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3)

So I put the separator as Char and it shows what I intended.

scala> "123.123.123.123|000.000.000.000".split('|')
res31: Array[String] = Array(123.123.123.123, 000.000.000.000)

scala> "123.123.123.123".split('|')
res32: Array[String] = Array(123.123.123.123)

Why does single character make a huge difference?

I've read Scala doc and StringLike.scala, and got no answer.

def split(separators: Array[Char]): Array[String]
def split(separator: Char): Array[String]

Thanks.

like image 910
eces Avatar asked May 27 '13 07:05

eces


People also ask

How do you split a string into characters?

(which means "any character" in regex), use either backslash \ to escape the individual special character like so split("\\.") , or use character class [] to represent literal character(s) like so split("[.]") , or use Pattern#quote() to escape the entire string like so split(Pattern.

Can a string be split on multiple characters?

Method 1: Split multiple characters from string using re. split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.


2 Answers

Split method accepts either string or character(s). If you use string it will be interpreted as a regexp and "|" is treated as regex 'or' -- in your case it backs to 'every character goes to it's own bin'. Escape it to have raw delimeter:

"123.123.123.123|000.000.000.000".split("\\|")
res1: Array[String] = Array(123.123.123.123, 000.000.000.000)

Character separator is interpreted as is, so you got the desired result without any fuss

like image 126
om-nom-nom Avatar answered Nov 06 '22 18:11

om-nom-nom


Note that, as om-nom-nom correctly mentioned (but didn't provide the example), characters (which are enclosed in single ') are also valid:

"123.123.123.123|000.000.000.000".split('|')

I find this to be more obvious/readable. I'm also assuming that this would be faster, since it does not have to invoke the regex parser. But that is speculation of course, and also unnecessary micro-optimization.

like image 24
fresskoma Avatar answered Nov 06 '22 19:11

fresskoma