In Ruby, if I have two Regexps, I have the possibility to create another regexp like this:
a = /\d+/ # Matches digits
b = /\s+/ # Matches whitespaces
c = Regexp.union(a, b) # Matches sequences that consist only of digits or only of whitespaces
I want to do the same thing in Scala, but I didn't find out how I could do that. Note that I am not asking for a syntax to create a union of character classes like (\d+)|(\s+)
in the previous example, I am really looking for a possibility to create a new Regexp from two given Regexps.
Actually, in the end, I will not do it for just two Regexps but a large number. I don't care about grouping or anything, I just want to know if a String matches one of a list of given Regexps. I could just check all of them in a loop, but that is too inefficient, that is why I need one Regexp to check the union.
Scala String matches() method with exampleThe matches() method is used to check if the string stated matches the specified regular expression in the argument or not. Return Type: It returns true if the string matches the regular expression else it returns false.
Language. Regular expressions are strings which can be used to find patterns (or lack thereof) in data. Any string can be converted to a regular expression using the . r method. Scala 2.
Regular Expressions explain a common pattern utilized to match a series of input data so, it is helpful in Pattern Matching in numerous programming languages. In Scala Regular Expressions are generally termed as Scala Regex. Regex is a class which is imported from the package scala. util. matching.
Scala uses the Java regex engine, which is based on the class java.util.regex.Pattern
. Pattern
has exactly one method that can create a regex:
public static Pattern compile(String regex)
That's it, and Scala doesn't give you any relevant enhancements.
But one thing you can do is use the built-in unioning in match statements, here shown with capturing groups in case you want to pull something out of the string:
val Dig = """(\d+)""".r
val Wsp = """(\s+)""".r
scala> "45" match { case Dig(_) | Wsp(_) => println("found"); case _ => }
found
scala> " " match { case Dig(_) | Wsp(_) => println("found"); case _ => }
found
If you really want a combined regex, you have to do it at the string level. You can get the java Pattern
from a Scala regex with .pattern
, and another .pattern
then gets the string. Most regexes can be wrapped safely in (?:)
to get a non-capturing block, so you can combine like so:
val Both = ("(?:"+Dig.pattern.pattern+")|(?:"+Wsp.pattern.pattern+")").r
However, any capturing groups inside will both be represented, but the non-used branch will be null
(not exactly a good way to write idiomatic Scala, but anyway, this is what Java uses):
scala> "2" match { case Both(d,w) => if (w!=null) println("white") else println(d) }
2
scala> " " match { case Both(d,w) => if (w!=null) println("white") else println(d) }
white
If you want to combine and reuse regex parts, I wrote REL a library/DSL that does just that. Example usage for you case:
import fr.splayce.rel._
import Implicits._
val a: RE = "\\d+"
val b: RE = "\\s+"
val c: RE = a | b
c
has a r
method to get a Regex object. It is also in Implicits
, so you can use it as a regex, say c findAllIn someText
. It will automatically wrap a
and b
in non-capturing groups if needed.
If you have a collection of regexes, you can just do reduceLeft
:
val regexes: List[RE] = List("a", "b", "c")
regexes.reduceLeft(_ | _)
On a side note:
Symbols._
, you have short notations for things like \d
and \s
Thus, with REL, you can write the first example directly as:
val c = δ.+ | σ.+
It also provides ways to reuse and combine the associated extractors.
If you prefer vanilla scala, then I have nothing to add to Rex Kerr's answer.
@akauppi if you want a list of regexes to match a given string you could do something like this:
val regexes = List("\\d+".r, "\\s+".r, "a".r)
val single = s"(${regexes.mkString("|")})".r
"123" match {
case single(_*) = println("match")
case _ => println("no match")
}
// above prints: match
"123 " match {
case single(_*) = println("match")
case _ => println("no match")
}
// above prints: no match
The best way to utilize a list of regexes is to use regex notation. This is really the same as saying
val single = "(\\d+|\\s+|a)".r
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With