Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang regex to extract values inside parantheses and ignore inner parantheses in any [duplicate]

Tags:

regex

go

re2

I have the following example of key=value pairs as one line string

start=("a", "b") and between=("range(2019, max, to=\"le\")") and end=("a", "b")

Using regex in golang I want to extract the key=value pairs as below

  1. start=("a", "b")
  2. between=("range(2019, max, to=\"le\")")
  3. end=("a", "b")

There are solutions on stackoverflow but do not work with golang regex.

There is a link to my failed attempt with golang regex: regex101 golang flavor

I would appreciate any help.

like image 393
Hussain Avatar asked Oct 15 '22 08:10

Hussain


1 Answers

The problem is the escaped quotes:

\S+=(\([^(]*(?:[^("]*"(?:[^\\"]|\\["\\])*")(\)))

https://regex101.com/r/3ytO9P/1

I changed [^"] to (?:[^\\"]|\\["\\]). This makes the regex look for either a regular character or an escape. By matching the escape, it doesn’t allow \" to end the match.

Your regex has other problems though. This should work better:

\S+=(\([^("]*(?:[^("]*"(?:[^\\"]|\\["\\])*")*(\)))

https://regex101.com/r/OuDvyX/1

It changes [^(] to [^("] to prevent " from being matched unless it’s part of a complete string.


UPDATE:

@Wiktor Stribiżew commented below:

It still does not support other escape sequences. The first [^("]* is redundant in the current pattern. It won't match between=("a",,,) but will match between=("a",,",") - this is inconsistent. The right regex will match valid double quoted string literals separated with commas and any amount of whitespace between them. The \S+=(\([^(]*(?:[^("]*"(?:[^\\"]|\\["\\])*")(\))) is not the right pattern IMHO

If you really want the regex to be that robust, you should use a parser, but you could fix those problems by using:

\S+=(\((?:[^("]*"(?:[^\\"]|\\.)*"[^("]*)*(\)))
like image 195
Anonymous Avatar answered Oct 21 '22 13:10

Anonymous