Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pattern match using regular expression in Scala?

Tags:

regex

scala

People also ask

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Does Scala have pattern matching?

Notes. Scala's pattern matching statement is most useful for matching on algebraic types expressed via case classes. Scala also allows the definition of patterns independently of case classes, using unapply methods in extractor objects.

What is a regex in Scala?

Regular Expressions explain a common pattern utilized to match a series of input data so, it is helpful in Pattern Matching in numerous programming languages. In Scala Regular Expressions are generally termed as Scala Regex. Regex is a class which is imported from the package scala. util. matching.

Which package is used for pattern matching with regular expressions?

Java provides the java. util. regex package for pattern matching with regular expressions.


You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.

val Pattern = "([a-cA-C])".r
word.firstLetter match {
   case Pattern(c) => c bound to capture group here
   case _ =>
}

Since version 2.10, one can use Scala's string interpolation feature:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

Even better one can bind regular expression groups:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

It is also possible to set more detailed binding mechanisms:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:

object T {

  class RegexpExtractor(params: List[String]) {
    def unapplySeq(str: String) =
      params.headOption flatMap (_.r unapplySeq str)
  }

  class StartsWithExtractor(params: List[String]) {
    def unapply(str: String) =
      params.headOption filter (str startsWith _) map (_ => str)
  }

  class MapExtractor(keys: List[String]) {
    def unapplySeq[T](map: Map[String, T]) =
      Some(keys.map(map get _))
  }

  import scala.language.dynamics

  class ExtractorParams(params: List[String]) extends Dynamic {
    val Map = new MapExtractor(params)
    val StartsWith = new StartsWithExtractor(params)
    val Regexp = new RegexpExtractor(params)

    def selectDynamic(name: String) =
      new ExtractorParams(params :+ name)
  }

  object p extends ExtractorParams(Nil)

  Map("firstName" -> "John", "lastName" -> "Doe") match {
    case p.firstName.lastName.Map(
          Some(p.Jo.StartsWith(fn)),
          Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
      println(s"Match! $fn ...$lastChar")
    case _ => println("nope")
  }
}

As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:

word.matches("[a-cA-C].*")

You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).


To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:

val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
  case Process("b", _) => println("first: 'a', some rest")
  case Process(_, rest) => println("some first, rest: " + rest)
  // etc.
}

String.matches is the way to do pattern matching in the regex sense.

But as a handy aside, word.firstLetter in real Scala code looks like:

word(0)

Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:

"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true

I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.

EDIT: To be clear, the way I would do this is, as others have said:

"Cat".matches("^[a-cA-C].*")
res14: Boolean = true

Just wanted to show an example as close as possible to your initial pseudocode. Cheers!


Note that the approach from @AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:

scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*

scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo

scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

And with no .* at the end:

scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)

scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match

First we should know that regular expression can separately be used. Here is an example:

import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
  case Some(v) => println(v)
  case _ =>
} // output: Scala

Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
  case date(year, month, day) => "hello"
} // output: hello

In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex