How can I extract a string using regular expressions in Haskell? <pre class="prettyprint"><code>let x = "xyz abc" =~ "(\\w+) \\w+" :: String </code></pre> That doesn't event get a match <pre class="prettyprint"><code>let x = "xyz abc" =~ "(.*) .*" :: String </code></pre> That does but x ends up as "xyz abc" how do I extract only the first regex group so that x is "xyz"?

I wrote/maintain such packages as regex-base, regex-pcre, and regex-tdfa. In regex-base the Text.Regex.Base.Context module documents the large number of instances of RegexContext that =~ uses. These are implemented on top of RegexLike which provides the underlying way to call matchText and matchAllText. The [[String]] that KennyTM mentions is another instance of RegexContext, and may or may not be one that works best for you. A comprehensive instance is <pre class="prettyprint"><code>RegexContext a b (AllTextMatches (Array Int) (MatchText b)) type MatchText source = Array Int (source, (MatchOffset, MatchLength)) </code></pre> which can be used to get a MatchText for everything: <pre class="prettyprint"><code>let x :: Array Int (MatchText String) x = getAllTextMatches $ "xyz abc" =~ "(\\w+) \\w+" </code></pre> At which point x is an Array Int of matches of an Array Int of group-matches. Note that "\w" is Perl syntax so you need regex-pcre to access it. If you want Unix/Posix extended regular expressions you should use regex-tdfa which is cross-platform and avoid using regex-posix that hits each platform's bugs in implementing the regex.h library. Note that Perl vs Posix is not just a matter of syntax like "\w". They use very different algorithms and often return different results. Also, the time and space complexity are very different. For matching against a string of length 'n' Perl style (regex-pcre) can be O(exp(n)) in time while Posix style using regex-posix is always O(n) in time.

Cast the result as <code>[[String]]</code>. Then you'll get a list of matches, each being the list of matched text and the captured subgroups. <pre class="prettyprint"><code>Prelude Text.Regex.PCRE> "xyz abc more text" =~ "(\\w+) \\w+" :: [[String]] [["xyz abc","xyz"],["more text","more"]] </code></pre>

Grouping in haskell regular expressions

Tags:

regex

haskell

How can I extract a string using regular expressions in Haskell?

let x = "xyz abc" =~ "(\\w+) \\w+" :: String

That doesn't event get a match

let x = "xyz abc" =~ "(.*) .*" :: String

That does but x ends up as "xyz abc" how do I extract only the first regex group so that x is "xyz"?

729

asked Apr 08 '11 06:04

sipsorcery

2 Answers

I wrote/maintain such packages as regex-base, regex-pcre, and regex-tdfa.

In regex-base the Text.Regex.Base.Context module documents the large number of instances of RegexContext that =~ uses. These are implemented on top of RegexLike which provides the underlying way to call matchText and matchAllText.

The [[String]] that KennyTM mentions is another instance of RegexContext, and may or may not be one that works best for you. A comprehensive instance is

RegexContext a b (AllTextMatches (Array Int) (MatchText b))

type MatchText source = Array Int (source, (MatchOffset, MatchLength))

which can be used to get a MatchText for everything:

let x :: Array Int (MatchText String)
    x = getAllTextMatches $ "xyz abc" =~ "(\\w+) \\w+"

At which point x is an Array Int of matches of an Array Int of group-matches.

Note that "\w" is Perl syntax so you need regex-pcre to access it. If you want Unix/Posix extended regular expressions you should use regex-tdfa which is cross-platform and avoid using regex-posix that hits each platform's bugs in implementing the regex.h library.

Note that Perl vs Posix is not just a matter of syntax like "\w". They use very different algorithms and often return different results. Also, the time and space complexity are very different. For matching against a string of length 'n' Perl style (regex-pcre) can be O(exp(n)) in time while Posix style using regex-posix is always O(n) in time.

answered Sep 23 '22 02:09

Chris Kuklewicz

Cast the result as [[String]]. Then you'll get a list of matches, each being the list of matched text and the captured subgroups.

Prelude Text.Regex.PCRE> "xyz abc more text" =~ "(\\w+) \\w+" :: [[String]]
[["xyz abc","xyz"],["more text","more"]]

answered Sep 26 '22 02:09

kennytm

Related questions
                            
                                JavaScript Regex (string should include only alpha, space, hyphen)
                            
                                Regular Expression for delimited email address
                            
                                Can I "combine" 2 regex with a logic or?
                            
                                Strange behavior in a perl regexp with global substitution
                            
                                How to find spans with a specific class containing specific text using beautiful soup and re?
                            
                                Make array from regex
                            
                                Mongodb match accented characters as underlying character
                            
                                sed: unescaped newline inside substitute pattern?
                            
                                Javascript regex (negative) lookbehind not working in firefox
                            
                                Regular Expression over multiple lines
                            
                                Regular expressions: Matching strings starting with dot (.)?
                            
                                PCRE in Haskell - what, where, how?
                            
                                Scala regex Named Capturing Groups
                            
                                Vim search and replace, adding a constant
                            
                                Regex to remove non-letter characters but keep accented letters
                            
                                Get content between parenthesis from String object in Ruby
                            
                                One or two numeric digits Regex
                            
                                Highlight every matched pattern while searching in Vim [duplicate]
                            
                                Regex for default ASP.NET Core Identity Password
                            
                                Java - regular expression finding comments in code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With