Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there a limit on the length of a regex pattern in Text.Regex.Posix?

Tags:

haskell

I'm seeing an odd arbitrary limit on the length of a regular expression pattern where, after a string of 30 characters, funny things start to happen.

Simple GHCI example:

> import Text.Regex.Posix
> "abcdefghijklmnopqrstuvwxyz0123456789" =~ "abcdefghijklmnopqrstuvwxyz0123" :: String
"abcdefghijklmnopqrstuvwxyz0123"
> "abcdefghijklmnopqrstuvwxyz0123456789" =~ "abcdefghijklmnopqrstuvwxyz01234" :: String
""

The only difference is the addition of the 4 at the end of the last pattern. It's a valid regex and should match but it gives me an empty string.

It gets even weirder if I add a few more valid characters to the pattern:

> "abcdefghijklmnopqrstuvwxyz0123456789" =~ "abcdefghijklmnopqrstuvwxyz01234567" :: String
"ab"

It tells me it only matches ab when clearly that's wrong.

My environment:

  • Stack version 1.1.2 (resolver lts-6.7)
  • GHC version 7.10.3
  • OS: Windows 10
  • regex-posix-0.95.2

A complete uninstall and reinstall of Stack and all packages did not solve the problem

like image 381
Chad Gilbert Avatar asked Jul 12 '16 13:07

Chad Gilbert


1 Answers

Following this discussion, it seems like there are other issues with this library stemming from the underlying C code not being properly ported to 64-bit architecture.

I have switched to the regex-tdfa package and no longer have these problems.

like image 132
Chad Gilbert Avatar answered Oct 15 '22 22:10

Chad Gilbert