I am trying to match all the occurrences of a regex and get the indices as a result. The example from Real World Haskell says I can do
string =~ regex :: [(Int, Int)]
However, this is broken since the regex library has been updated since the publication of RWH. (See All matches of regex in Haskell and "=~" raise "No instance for (RegexContext Regex [Char] [String])"). What is the correct way to do this?
Update:
I found matchAll which might give me what I want. I have no idea how to use it, though.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.
span() returns both start and end indexes in a single tuple. Since the match() method only checks if the RE matches at the start of a string, start() will always be zero. However, the search() method of patterns scans through the string, so the match may not start at zero in that case.
To count the number of regex matches, call the match() method on the string, passing it the regular expression as a parameter, e.g. (str. match(/[a-z]/g) || []). length . The match method returns an array of the regex matches or null if there are no matches found.
The key to using matchAll
is using the type annotation :: Regex
when creating regexs:
import Text.Regex
import Text.Regex.Base
re = makeRegex "[^aeiou]" :: Regex
test = matchAll re "the quick brown fox"
This returns a list of arrays. To get a list of (offset,length) pairs, just access the first element of each array:
import Data.Array ((!))
matches = map (!0) $ matchAll re "the quick brown fox"
-- [(0,1),(1,1),(3,1),(4,1),(7,1),(8,1),(9,1),(10,1),(11,1),(13,1),(14,1),(15,1),(16,1),(18,1)]
To use the =~
operator, things may have changed since RWH. You should use the predefined types MatchOffset
and MatchLength
and the special type constructor AllMatches
:
import Text.Regex.Posix
re = "[^aeiou]"
text = "the quick brown fox"
test1 = text =~ re :: Bool
-- True
test2 = text =~ re :: String
-- "t"
test3 = text =~ re :: (MatchOffset,MatchLength)
-- (0,1)
test4 = text =~ re :: AllMatches [] (MatchOffset, MatchLength)
-- (not showable)
test4' = getAllMatches $ (text =~ re :: AllMatches [] (MatchOffset, MatchLength))
-- [(0,1),(1,1),(3,1),(4,1),(7,1),(8,1),(9,1),(10,1),(11,1),(13,1),(14,1),(15,1),(16,1),(18,1)]
See the docs for Text.Regex.Base.Context for more details on what contexts are available.
UPDATE: I believe the type constructor AllMatches
was introduced to resolve the ambiguity introduced when an regex has subexpressions -- e.g.:
foo = "axx ayy" =~ "a(.)([^a])"
test1 = getAllMatches $ (foo :: AllMatches [] (MatchOffset, MatchLength))
-- [(0,3),(3,3)]
-- returns the locations of "axx" and "ayy" but no subexpression info
test2 = foo :: MatchArray
-- array (0,2) [(0,(0,3)),(1,(1,1)),(2,(2,1))]
-- returns only the match with "axx"
Both are essentially a list of offset-length pairs, but they mean different things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With