Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Regex union

Tags:

regex

scala

In Ruby, if I have two Regexps, I have the possibility to create another regexp like this:

a = /\d+/ # Matches digits
b = /\s+/ # Matches whitespaces
c = Regexp.union(a, b) # Matches sequences that consist only of digits or only of whitespaces

I want to do the same thing in Scala, but I didn't find out how I could do that. Note that I am not asking for a syntax to create a union of character classes like (\d+)|(\s+) in the previous example, I am really looking for a possibility to create a new Regexp from two given Regexps.

Actually, in the end, I will not do it for just two Regexps but a large number. I don't care about grouping or anything, I just want to know if a String matches one of a list of given Regexps. I could just check all of them in a loop, but that is too inefficient, that is why I need one Regexp to check the union.

like image 737
Lykos Avatar asked Dec 11 '12 15:12

Lykos


People also ask

How do you check if a string matches a regex in Scala?

Scala String matches() method with exampleThe matches() method is used to check if the string stated matches the specified regular expression in the argument or not. Return Type: It returns true if the string matches the regular expression else it returns false.

What is .R in Scala?

Language. Regular expressions are strings which can be used to find patterns (or lack thereof) in data. Any string can be converted to a regular expression using the . r method. Scala 2.

What is regex in Scala?

Regular Expressions explain a common pattern utilized to match a series of input data so, it is helpful in Pattern Matching in numerous programming languages. In Scala Regular Expressions are generally termed as Scala Regex. Regex is a class which is imported from the package scala. util. matching.


3 Answers

Scala uses the Java regex engine, which is based on the class java.util.regex.Pattern. Pattern has exactly one method that can create a regex:

public static Pattern compile(String regex)

That's it, and Scala doesn't give you any relevant enhancements.

But one thing you can do is use the built-in unioning in match statements, here shown with capturing groups in case you want to pull something out of the string:

val Dig = """(\d+)""".r
val Wsp = """(\s+)""".r

scala> "45" match { case Dig(_) | Wsp(_) => println("found"); case _ => }

found

scala> "   " match { case Dig(_) | Wsp(_) => println("found"); case _ => }

found

If you really want a combined regex, you have to do it at the string level. You can get the java Pattern from a Scala regex with .pattern, and another .pattern then gets the string. Most regexes can be wrapped safely in (?:) to get a non-capturing block, so you can combine like so:

val Both = ("(?:"+Dig.pattern.pattern+")|(?:"+Wsp.pattern.pattern+")").r

However, any capturing groups inside will both be represented, but the non-used branch will be null (not exactly a good way to write idiomatic Scala, but anyway, this is what Java uses):

scala> "2" match { case Both(d,w) => if (w!=null) println("white") else println(d) }
2

scala> " " match { case Both(d,w) => if (w!=null) println("white") else println(d) }
white
like image 109
Rex Kerr Avatar answered Oct 22 '22 16:10

Rex Kerr


If you want to combine and reuse regex parts, I wrote REL a library/DSL that does just that. Example usage for you case:

import fr.splayce.rel._
import Implicits._

val a: RE = "\\d+"
val b: RE = "\\s+"
val c: RE = a | b

c has a r method to get a Regex object. It is also in Implicits, so you can use it as a regex, say c findAllIn someText. It will automatically wrap a and b in non-capturing groups if needed.

If you have a collection of regexes, you can just do reduceLeft:

val regexes: List[RE] = List("a", "b", "c")
regexes.reduceLeft(_ | _)

On a side note:

  • if you import Symbols._, you have short notations for things like \d and \s
  • it implements most of your usual regex operations for maximum reusability

Thus, with REL, you can write the first example directly as:

val c = δ.+ | σ.+

It also provides ways to reuse and combine the associated extractors.

If you prefer vanilla scala, then I have nothing to add to Rex Kerr's answer.

like image 20
instanceof me Avatar answered Oct 22 '22 16:10

instanceof me


@akauppi if you want a list of regexes to match a given string you could do something like this:

val regexes = List("\\d+".r, "\\s+".r, "a".r)
val single  = s"(${regexes.mkString("|")})".r
"123" match {
  case single(_*) = println("match")
  case _ => println("no match")
}
// above prints: match

"123  " match {
  case single(_*) = println("match")
  case _ => println("no match")
}
// above prints: no match

The best way to utilize a list of regexes is to use regex notation. This is really the same as saying

val single = "(\\d+|\\s+|a)".r
like image 43
gdoubleod Avatar answered Oct 22 '22 16:10

gdoubleod