Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Kotlin String.split with a regex string not the same as Java?

I have the following Java code:

String str = "12+20*/2-4";
List<String> arr = new ArrayList<>();

arr = str.split("\\p{Punct}");

//expected: arr = {12,20,2,4}

I want the equivalent Kotlin code, but .split("\\p{Punct}") doesn't work. I don't understand the documentation here: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/split.html

like image 898
Ilya Fedoseev Avatar asked Jul 12 '17 18:07

Ilya Fedoseev


3 Answers

you should using String#split(Regex) instead, for example:

val str = "12+20*/2-4"; val arr = str.split("\\p{Punct}".toRegex()); //  ^--- but the result is ["12","20","","2","4"]  val arr2 = arr.filter{ !it.isBlank() }; //  ^--- you can filter it as further, and result is: ["12","20","2","4"] 

OR you can split more Punctuations by using \\p{Punct}+ , for example:

val arr = str.split("\\p{Punct}+".toRegex()) //  ^--- result is: ["12","20","2","4"] 

OR invert the regex and using Regex#findAll instead, and you can find out the negative numbers in this way. for example:

val str ="12+20*/2+(-4)";  val arr ="(?<!\\d)-?[^\\p{Punct}]+".toRegex().findAll(str).map{ it.value }.toList() //  ^--- result is ["12","20","2","-4"] //   negative number is found   ---^ 
like image 101
holi-java Avatar answered Oct 07 '22 04:10

holi-java


For regex behavior, your argument must be of type Regex, not merely a String containing special regex characters.

Most string manipulation methods in Kotlin (replace, split, etc.) can take both String and Regex arguments, but you must convert your String to Regex if you want regex-specific matching.

This conversion can be done using String.toRegex() or Regex(String):

val str = "12+20*/2-4";
str.split("\\p{Punct}".toRegex()) //this
str.split(Regex("\\p{Punct}")) //or this

Currently split is treating that first backslash as an escape character instead of recognizing it as a special regex sequence.


as mentioned by @holi-java in their answer this will match an empty string between * and / giving ["12","20","","2","4"]. You can use "\\p{Punct}+" as your regex to avoid this. (Though note that Java gives the output with this empty string unless a + is included there as well.)

like image 24
River Avatar answered Oct 07 '22 04:10

River


You can call

str.split(Regex("{\\p{Punct}"))
like image 23
tango24 Avatar answered Oct 07 '22 05:10

tango24