I'm trying to split a text bunch with |
bar separator. 123.123.123.123|000.000.000.000
to each ip address blocks. But each numbers are splited not by |
.
scala> "123.123.123.123|000.000.000.000".split("|")
res30: Array[java.lang.String] = Array("", 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, |, 0, 0, 0, ., 0, 0, 0, ., 0, 0, 0, ., 0, 0, 0)
scala> "123.123.123.123".split("|")
res33: Array[java.lang.String] = Array("", 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3)
So I put the separator as Char
and it shows what I intended.
scala> "123.123.123.123|000.000.000.000".split('|')
res31: Array[String] = Array(123.123.123.123, 000.000.000.000)
scala> "123.123.123.123".split('|')
res32: Array[String] = Array(123.123.123.123)
Why does single character make a huge difference?
I've read Scala doc and StringLike.scala, and got no answer.
def split(separators: Array[Char]): Array[String]
def split(separator: Char): Array[String]
Thanks.
(which means "any character" in regex), use either backslash \ to escape the individual special character like so split("\\.") , or use character class [] to represent literal character(s) like so split("[.]") , or use Pattern#quote() to escape the entire string like so split(Pattern.
Method 1: Split multiple characters from string using re. split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.
Split method accepts either string or character(s). If you use string it will be interpreted as a regexp and "|" is treated as regex 'or' -- in your case it backs to 'every character goes to it's own bin'. Escape it to have raw delimeter:
"123.123.123.123|000.000.000.000".split("\\|")
res1: Array[String] = Array(123.123.123.123, 000.000.000.000)
Character separator is interpreted as is, so you got the desired result without any fuss
Note that, as om-nom-nom correctly mentioned (but didn't provide the example), characters (which are enclosed in single '
) are also valid:
"123.123.123.123|000.000.000.000".split('|')
I find this to be more obvious/readable. I'm also assuming that this would be faster, since it does not have to invoke the regex parser. But that is speculation of course, and also unnecessary micro-optimization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With