Is there a simple way to return regex matches as an array?
Here is how I am trying in 2.7.7
:
val s = """6 1 2"""
val re = """(\d+)\s(\d+)\s(\d+)""".r
for (m <- re.findAllIn (s)) println (m) // prints "6 1 2"
re.findAllIn (s).toList.length // 3? No! It returns 1!
But I then tried:
s match {
case re (m1, m2, m3) => println (m1)
}
And this works fine! m1 is 6, m2 is 1, etc.
Then I found something that added to my confusion:
val mit = re.findAllIn (s)
println (mit.toString)
println (mit.length)
println (mit.toString)
That prints:
non-empty iterator
1
empty iterator
The "length" call somehow modifies the state of the iterator. What is going on here?
Ok, first of all, understand that findAllIn
returns an Iterator
. An Iterator
is a consume-once mutable object. ANYTHING you do to it will change it. Read up on iterators if you are not familiar with them. If you want it to be reusable, then convert the result of findAllIn into a List
, and only use that list.
Now, it seems you want all matching groups, not all matches. The method findAllIn
will return all matches of the full regex that can be found on the string. For example:
scala> val s = """6 1 2, 4 1 3"""
s: java.lang.String = 6 1 2, 4 1 3
scala> val re = """(\d+)\s(\d+)\s(\d+)""".r
re: scala.util.matching.Regex = (\d+)\s(\d+)\s(\d+)
scala> for(m <- re.findAllIn(s)) println(m)
6 1 2
4 1 3
See that there are two matches, and neither of them include the ", " at the middle of the string, since that's not part of any match.
If you want the groups, you can get them like this:
scala> val s = """6 1 2"""
s: java.lang.String = 6 1 2
scala> re.findFirstMatchIn(s)
res4: Option[scala.util.matching.Regex.Match] = Some(6 1 2)
scala> res4.get.subgroups
res5: List[String] = List(6, 1, 2)
Or, using findAllIn
, like this:
scala> val s = """6 1 2"""
s: java.lang.String = 6 1 2
scala> for(m <- re.findAllIn(s).matchData; e <- m.subgroups) println(e)
6
1
2
The matchData
method will make an Iterator
that returns Match
instead of String
.
There is a difference between how unapplySeq interprets mulitple groups and how findAllIn does. findAllIn scans your pattern over the string and returns each string that matches (advancing by the match if it succeeds, or one character if it fails).
So, for example:
scala> val s = "gecko 6 1 2 3 4 5"
scala> re.findAllIn(s).toList
res3: List[String] = List(6 1 2, 3 4 5)
On the other hand, unapplySeq assumes a perfect match to the sequence.
scala> re.unapplySeq(s)
res4: Option[List[String]] = None
So, if you want to parse apart groups that you have specified in an exact regex string, use unapplySeq. If you want to find those subsets of the string that look like your regex pattern, use findAllIn. If you want to do both, chain them yourself:
scala> re.findAllIn(s).flatMap(text => re.unapplySeq(text).elements )
res5: List[List[String]] = List(List(6, 1, 2), List(3, 4, 5))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With