Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working regex fails when using Scala pattern matching

Tags:

regex

scala

In a following code the same pattern matches when Java API is used, but not when using Scala pattern matching.

import java.util.regex.Pattern

object Main extends App {
  val text = "/oAuth.html?state=abcde&code=hfjksdhfrufhjjfkdjfkds"

  val statePatternString = """\/.*\?.*state=([^&\?]*)"""
  val statePattern = statePatternString.r
  val statePatternJ = Pattern.compile(statePatternString)

  val sj = statePatternJ.matcher(text)
  val sjMatch = if (sj.find()) sj.group(1) else ""
  println(s"Java match $sjMatch")

  val ss = statePattern.unapplySeq(text)
  println(s"Scala unapplySeq $ss")
  val sm = statePattern.findFirstIn(text)
  println(s"Scala findFirstIn $sm")

  text match {
    case statePattern(s) =>
      println(s"Scala matching $s")
    case _ =>
      println("Scala not matching")
  }

}

The app output is:

Java match abcde

Scala unapplySeq None

Scala findFirstIn Some(/oAuth.html?state=abcde)

Scala not matching

When using the extractor syntax val statePattern(se) = text the error is scala.MatchError.

What is causing the Scala regex unapplySeq to fail?

like image 938
Suma Avatar asked Mar 22 '16 13:03

Suma


1 Answers

When you define a Scala pattern, it is anchored by default (=requires a full string match), while your Java sj.find() is looking for a match anywhere inside the string. Add .unanchored for the Scala regex to also allow partial matches:

val statePattern = statePatternString.r.unanchored
                                       ^^^^^^^^^^^

See IDEONE demo

Some UnanchoredRegex reference:

def unanchored: UnanchoredRegex

Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.

Normally, matching on date behaves as though the pattern were enclosed in anchors, ^pattern$.

The unanchored Regex behaves as though those anchors were removed.

Note that this method does not actually strip any matchers from the pattern.

AN ALTERNATIVE SOLUTION would mean adding the .* at the pattern end, but remember that a dot does not match a newline by default. If a solution should be generic, the (?s) DOTALL modifier should be specified at the beginning of the pattern to make sure the whole string with potential newline sequences is matched.

like image 61
Wiktor Stribiżew Avatar answered Sep 28 '22 08:09

Wiktor Stribiżew