Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching in Kotlin

Tags:

regex

kotlin

How do I match secret_code_data in string:

xeno://soundcloud/?code=secret_code_data# 

I've tried

val regex = Regex("""xeno://soundcloud/?code=(.*?)#""") field = regex.find(url)?.value ?: "" 

without luck. I suspect ? before code might be the problem, should I escape it somehow. Can you help?

like image 501
ssuukk Avatar asked Jan 04 '16 15:01

ssuukk


People also ask

How do you use matches on Kotlin?

To check if a string matches given regular expression in Kotlin, call matches() method on this string and pass the regular expression as argument. matches() returns true if the string matches with the given regular expression, or else it returns false.

Does Kotlin have pattern matching?

In kotlin, pattern matching is the process of checking the datas whether it may be the specific sequence of the characters, tokens and even other data exists from among the other given datas the regular programming languages will make use of the regular expression like regex for pattern matching for to find and replace ...


1 Answers

Here are three options, the first providing a good Regex that does what you want, and the other two for parsing URL's using an alternative to Regex which handle URL component encoding/decoding correctly.

Parsing using Regex

NOTE: Regex method is unsafe in most use cases since it does not properly parse the URL into components, then decode each component separately. Normally you cannot decode the whole URL into one string and then parse safely because some encoded characters might confuse the Regex later. This is similar to parsing XHTML using regex (as described here). See alternatives to Regex below.

Here is a cleaned up regex as a unit test case that handles more URLs safely. At the end of this post is a unit test you can use for each method.

private val SECRET_CODE_REGEX = """xeno://soundcloud[/]?.*[\?&]code=([^#&]+).*""".toRegex() fun findSecretCode(withinUrl: String): String? =         SECRET_CODE_REGEX.matchEntire(withinUrl)?.groups?.get(1)?.value 

This regex handles these cases:

  • with and without trailing / in path
  • with and without fragment
  • parameter as first, middle or last in list of parameters
  • parameter as only parameter

Note that idiomatic way to make a regex in Kotlin is someString.toRegex(). It and other extension methods can be found in the Kotlin API Reference.

Parsing using UriBuilder or similar class

Here is an example using the UriBuilder from the Klutter library for Kotlin. This version handles encoding/decoding including more modern JavaScript unicode encodings not handled by the Java standard URI class (which has many issues). This is safe, easy, and you don't need to worry about any special cases.

Implementation:

fun findSecretCode(withinUrl: String): String? {     fun isValidUri(uri: UriBuilder): Boolean = uri.scheme == "xeno"                     && uri.host == "soundcloud"                     && (uri.encodedPath == "/" || uri.encodedPath.isNullOrBlank())     val parsed = buildUri(withinUrl)     return if (isValidUri(parsed)) parsed.decodedQueryDeduped?.get("code") else null } 

The Klutter uy.klutter:klutter-core-jdk6:$klutter_version artifact is small, and includes some other extensions include the modernized URL encoding/decoding. (For $klutter_version use the most current release).

Parsing with JDK URI Class

This version is a little longer, and shows you need to parse the raw query string yourself, decode after parsing, then find the query parameter:

fun findSecretCode(withinUrl: String): String? {     fun isValidUri(uri: URI): Boolean = uri.scheme == "xeno"             && uri.host == "soundcloud"             && (uri.rawPath == "/" || uri.rawPath.isNullOrBlank())      val parsed = URI(withinUrl)     return if (isValidUri(parsed)) {         parsed.getRawQuery().split('&').map {             val parts = it.split('=')             val name = parts.firstOrNull() ?: ""             val value = parts.drop(1).firstOrNull() ?: ""             URLDecoder.decode(name, Charsets.UTF_8.name()) to URLDecoder.decode(value, Charsets.UTF_8.name())         }.firstOrNull { it.first == "code" }?.second     } else null } 

This could be written as an extension on the URI class itself:

fun URI.findSecretCode(): String? { ... } 

In the body remove parsed variable and use this since you already have the URI, well you ARE the URI. Then call using:

val secretCode = URI(myTestUrl).findSecretCode() 

Unit Tests

Given any of the functions above, run this test to prove it works:

class TestSo34594605 {     @Test fun testUriBuilderFindsCode() {         // positive test cases          val testUrls = listOf("xeno://soundcloud/?code=secret_code_data#",                 "xeno://soundcloud?code=secret_code_data#",                 "xeno://soundcloud/?code=secret_code_data",                 "xeno://soundcloud?code=secret_code_data",                 "xeno://soundcloud?code=secret_code_data&other=fish",                 "xeno://soundcloud?cat=hairless&code=secret_code_data&other=fish",                 "xeno://soundcloud/?cat=hairless&code=secret_code_data&other=fish",                 "xeno://soundcloud/?cat=hairless&code=secret_code_data",                 "xeno://soundcloud/?cat=hairless&code=secret_code_data&other=fish#fragment"         )          testUrls.forEach { test ->             assertEquals("secret_code_data", findSecretCode(test), "source URL: $test")         }          // negative test cases, don't get things on accident          val badUrls = listOf("xeno://soundcloud/code/secret_code_data#",                 "xeno://soundcloud?hiddencode=secret_code_data#",                 "http://www.soundcloud.com/?code=secret_code_data")          badUrls.forEach { test ->             assertNotEquals("secret_code_data", findSecretCode(test), "source URL: $test")         }     } 
like image 109
13 revs, 2 users 100% Avatar answered Sep 21 '22 20:09

13 revs, 2 users 100%