Word splitting with regular expressions in Haskell

Tags:

There are several packages available for the usage of regular expressions in Haskell (e.g. Text.Regex.Base, Text.Regex.Posix etc.). Most packages I've seen so far use a subset of Regex I know, by which I mean: I am used to split a sentence into words with the following Regex:

\\w+

Nearly all packages in Haskell I tried so far don't support this (at least the earlier mentioned and Text.Regex.TDFA neither). I know that with Posix the usage of [[:word:]+] would have the same effect, but I would like to use the variant mentioned above.

From there are two questions:

Is there any package to archive that?
If there really is, why is there a different common usage?
What advantages or disadvantages are there?

739

asked Dec 07 '11 14:12

beyeran

1 Answers

The '\w' is a Perl pattern, and supported by PCRE, which you can access in Haskell with my regex-pcre package or the pcre-light library. If your input is a list of Char then the 'words' function in the standard Prelude may be enough; if your input is ASCII bytestring then Data.ByteString.Char8 may work. There may be a utf8 library with word splitting, but I cannot quickly find it.

answered Oct 07 '22 02:10

Chris Kuklewicz

Related questions
                            
                                Using regex in Scala to group and pattern match
                            
                                Regex to allow numbers, plus symbol, minus symbol and brackets
                            
                                Instagram username Regex -PHP
                            
                                How can I show text with html format in xamarin forms
                            
                                SED to remove a Line with REGEX Pattern
                            
                                RegEx for removing non ASCII characters from both ends
                            
                                Using Regular Expressions in JSP EL
                            
                                Regular expressions in R to erase all characters after the first space?
                            
                                How to allow underscore and dash with ctype_alnum()?
                            
                                Vim: Ignoring errors in a list of mapped substitutions
                            
                                Validate email one-liner in Scala
                            
                                Extract a substring with a regular expression in PowerShell
                            
                                How to find and replace text in between two tags in HTML or XML document using jQuery?
                            
                                how to replace all special characters except underscore and period in php?
                            
                                convert regex string to regex object in javascript
                            
                                Replace single quotes with double with exclusion of some elements
                            
                                Regex for names validation allow only letters and spaces
                            
                                How do I remove duplicate characters and keep the unique one only in Perl?
                            
                                Why is the @ symbol escaped in this Perl regular expression?
                            
                                Java regular expression optimization tips

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Word splitting with regular expressions in Haskell

Tags:

regex

haskell

beyeran

People also ask

1 Answers

Chris Kuklewicz

Recent Activity

Donate For Us