Slow regex with python and javascript but fast to fail in go and php

Question

I wrote a regex to parse PostgreSQL errors to try and show to the user which field had duplicated data. The regex is this one:

^DETAIL:.[^$]+.(.[^$]+).[^$]+.(.[^$]+). already exists

It will be pretty fast if you run it against the right message like this one (https://regex101.com/r/GZuREV/1):

ERROR:  duplicate key value violates unique constraint "uq_content_block_internal_name_store_id"
DETAIL:  Key (lower(internal_name::text), store_id)=(some content block-32067683, 0c6d20a7-d843-44f3-af9c-4a2cf2a47e4c) already exists.

But if PostgreSQL emits another message like the following one, it will take python about 30 seconds in my machine to answer (https://regex101.com/r/GZuREV/2).

ERROR:  null value in column "active" violates not-null constraint
DETAIL:  Failing row contains (2018-08-16 14:23:52.214591+00, 2018-08-16 14:23:52.214591+00, null, 6f6d1bc9-c47e-46f8-b220-dae49bd58090, bf24d26e-4871-4335-9f18-83c5a52f1b3a, Some Product-a1c03dde-2de9-401c-92d5-5c1500908984, {"de_DE": "Fugit tempore voluptas quos est vitae.", "en_GB": "Qu..., {"de_DE": "Fuga reprehenderit nobis reprehenderit natus magni es..., {"de_DE": "Fuga provident dolorum. Corrupti sunt in tempore quae..., my-product-53077578, SKU-53075778, 600, 4300dc25-04e2-4193-94c0-8ee97b636739, 52553d24-6d1c-4ce6-89f9-4ad765599040, null, 38089c3c-423f-430c-b211-ab7a57dbcc13, 7d7dc30e-b06b-48b7-b674-26d4f705583b, null, {}, 0, null, 9980, 100, 1, 5).

If go to the regex101 link you can see that if you toggle to different languages like php or go, they all return quite fast saying that no match was found but if you choose python or javascript you will get a timeout.

My quick dirty fix was something like this:

match = 'already exists' in error_message and compiled_regex.search(error_message)

What do you think that could be causing that? Could it be the greedy operators to consume until I reach my desired data?

Update 1

Using re.IGNORECASE in python, makes it around 9 seconds slower as it spends too much time lowercasing something.

With ignorecase

Without ignorecase

Update 2

Playing around I saw that to make it work and fail a simple change will break

^DETAIL:.[^$]+?\((.[^$]+?).[^$]+?.(.[^$]+?). already exists
                            ^ just changing this to \) make it stop timing out
^DETAIL:.[^$]+?\((.[^$]+?)\)[^$]+?.(.[^$]+?). already exists

peterSO · Accepted Answer

Package regexp
import "regexp"
Package regexp implements regular expression search.

The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by RE2 and described at https://golang.org/s/re2syntax, except for \C. For an overview of the syntax, run
go doc regexp/syntax
The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.) For more information about this property, see

http://swtch.com/~rsc/regexp/regexp1.html

or any book about automata theory.

By design, Go regular expressions are guaranteed to run in time linear in the size of the input, a property not guaranteed by some other implementations of regular expressions. See Regular Expression Matching Can Be Simple And Fast.

Slow regex with python and javascript but fast to fail in go and php

Tags:

python

javascript

regex

go

Jayson Reis

1 Answers

peterSO

Recent Activity

Donate For Us

Slow regex with python and javascript but fast to fail in go and php

Tags:

python

javascript

regex

go

Jayson Reis

1 Answers

peterSO

Related questions

Recent Activity

Donate For Us