I am new to scala. I am trying to match a string delimited by double quotes, and I am a bit puzzled by the following behavior:
If I do the following:
val stringRegex = """"([^"]*)"(.*$)"""
val regex = stringRegex.r
val tidyTokens = Array[String]("1", "\"test\"", "'c'", "-23.3")
tidyTokens.foreach {
token => if (token.matches (stringRegex)) println (token + " matches!")
}
I get
"test" matches!
otherwise, if I do the following:
tidyTokens.foreach {
token => token match {
case regex(token) => println (token + " matches!")
case _ => println ("No match for token " + token)
}
}
I get
No match for token 1
No match for token "test"
No match for token 'c'
No match for token -23.3
Why doesn't "test" match in the second case?
Take your regular expression:
"([^"]*)"(.*$)
When compiled with .r
, this string yields a regex
object - which, if it matches it's input string, must yield 2 captured strings - one for the ([^"]*)
and the other for the (.*$)
. Your code
case regex(token) => ...
Ought to reflect this, so maybe you want
case regex(token, otherStuff) => ...
Or just
case regex(token, _) => ...
Why? Because the case regex(matchedCaputures...)
syntax works because regex
is an
object with an unapplySeq
method. case regex(token) => ...
translates (roughly) to:
case List(token) => ...
Where List(token)
is what regex.unapplySeq( inputString )
returns:
regex.unapplySeq("\"test\"") // Returns Some(List("test", ""))
Your regex does match the string "test"
but in the case
statement the regex extractor's unapplySeq
method returns a list of 2 strings because that is what the regex says it captures. That's unfortunate, but the compiler can't help you here because regular expressions are compiled from strings at runtime.
One alternative would be to use a non-capturing group:
val stringRegex = """"([^"]*)"(?:.*$)"""
// ^^
Then your code would work, because regex
will now be an extractor object whose
unapplySeq
method returns only a single captured group:
tidyTokens foreach {
case regex(token) => println (token + " matches!")
case t => println ("No match for token " + t)
}
Have a look at the tutorial on Extractor Objects, for a better understanding on
how apply
/ unapply
/ unapplySeq
works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With