Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex and Pattern Matching in Scala Part II

As a follow-up to this question

Here is some code that compiles and runs correctly, using captures.

val myString = "ACATCGTAGCTGCTAGCTG"

val nucCap = "([ACTG]+)".r

myString match {
   case nucCap(myNuc) => println("dna:"+myNuc)
   case _ => println("not dna")
}

>scala scalaTest.scala 
dna:ACATCGTAGCTGCTAGCTG

Here is simpler code, without capture, that does not compile.

val myString = "ACATCGTAGCTGCTAGCTG"

val nuc = "[ACGT]+".r

myString match {
     case nuc => println("dna")
     case _ => println("not dna")
}

>scala scalaTest.scala
scalaTest.scala:7: error: unreachable code

Seems like the matching should return a boolean regardless of whether a capture is used. What is going on here?

like image 916
Jeremy Leipzig Avatar asked Oct 25 '11 19:10

Jeremy Leipzig


People also ask

Does Scala have pattern matching?

Pattern matching is a way of checking the given sequence of tokens for the presence of the specific pattern. It is the most widely used feature in Scala. It is a technique for checking a value against a pattern. It is similar to the switch statement of Java and C.

What is a regex in Scala?

Regular Expressions explain a common pattern utilized to match a series of input data so, it is helpful in Pattern Matching in numerous programming languages. In Scala Regular Expressions are generally termed as Scala Regex. Regex is a class which is imported from the package scala. util. matching.

How do you check if a string matches a regex in Scala?

The matches() method is used to check if the string stated matches the specified regular expression in the argument or not. Return Type: It returns true if the string matches the regular expression else it returns false.

What are the different ways to implement match expressions in scala?

Using if expressions in case statements First, another example of how to match ranges of numbers: i match { case a if 0 to 9 contains a => println("0-9 range: " + a) case b if 10 to 19 contains b => println("10-19 range: " + b) case c if 20 to 29 contains c => println("20-29 range: " + c) case _ => println("Hmmm...") }


1 Answers

In your match block, nuc is a pattern variable and does not refer to the nuc in the enclosing scope. This makes the default case unreachable because the simple pattern nuc will match anything.

An empty pair of parentheses on nuc will make the syntactic sugar work and call the unapplySeq method on the Regex:

myString match {
  case nuc() => println("dna")
  case _ => println("not dna")
}

One way to avoid this pitfall is to rename nuc to Nuc. Starting with an uppercase letter makes it a stable identifier, so that it refers to the Nuc in the enclosing scope, rather than being treated by the compiler as a pattern variable.

val Nuc = "[ACGT]+".r
myString match {
  case Nuc => println("dna")
  case _ => println("not dna")
}

The above will print "not dna", because here we are simply comparing Nuc to myString, and they are not equal. It's a bug, but maybe a less confusing one!

Adding the parentheses will have the desired effect in this case too:

myString match {
  case Nuc() => println("dna")
  case _ => println("not dna")
}
// prints "dna"

By the way, it is not a boolean that is being returned, but an Option[List[String]]:

scala> nuc.unapplySeq(myString)
res17: Option[List[String]] = Some(List())
scala> nucCap.unapplySeq(myString)
res18: Option[List[String]] = Some(List(ACATCGTAGCTGCTAGCTG))
like image 157
Ben James Avatar answered Oct 11 '22 12:10

Ben James