Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex alteration with shared sub expression with different prefix and suffix expressions

I have the below regex which has 3 alternations (see whole regex below), each with its own prefix and suffix characters. I feel that this is repeating excessively and would like to simplify if possible. I am matching values in an improperly formed JSON string to replace values that do not have a key with indexed keys.

Each alternation should match a prefix and suffix pair with the sub expression. I have 3 pairs at this time, but that might change. If I had several more pairs the whole regex would become a nightmare to modify and understand if I had a need to modify the repeated sub expression.

Question

How might I shorten the the whole regex below without needing to repeat the sub expression for the listed pairs of suffixes and prefixes?

Sub expression, repeated in each alternation

("(?:[^\\"]+|\\.)*")

prefix/suffix pairs

  1. { ,
  2. , ,
  3. , }

Whole Regex

/\{("(?:[^\\"]+|\\.)*")(?=,)|,("(?:[^\\"]+|\\.)*")(?=,)|,("(?:[^\\"]+|\\.)*")(?=\})/g

Test Strings

  • {"trailer":"","pallet":"A","date":"11-Dec-15","c","z","a"}
  • {"trailer":"","pallet":"A","a","date":"11-Dec-15"}
  • {"a","trailer":"","pallet":"A","date":"11-Dec-15"}
  • {"a","trailer":"","pallet":"A","date":"11-Dec-15","z\""}
  • {"trailer":"","pallet":"A","11-Dec-15"}
  • {"trailer\"","pallet":"A","11-Dec\"-15","z\""}

Live Example

Please limit answers to regex alternations and not JSON validation techniques as I am trying to gain a better understanding of regex and this is just the example that I using.

like image 939
pcnate Avatar asked Nov 08 '22 23:11

pcnate


1 Answers

Whilst the regular expression can be simplified from:

/\{("(?:[^\\"]+|\\.)*")(?=,)|,("(?:[^\\"]+|\\.)*")(?=,)|,("(?:[^\\"]+|\\.)*")(?=\})/g

To:

/{("(?:[^\\"]+|\\.)*")(?=,)|,("(?:[^\\"]+|\\.)*")(?=,)|,("(?:[^\\"]+|\\.)*")(?=})/g

Removing the escaping of the { and } as it's not required for JavaScript's regex engine.

This it is not possible to remove your explicit repeated pattern ("(?:[^\\"]+|\\.)*") in JavaScript.

JavaScript doesn't not support all the same regular expression functionality PCRE based (PHP, C++, Perl, etc.) regex engine supports.

For example in PHP / C++ you could do this:

{("(?:[^\\"]+|\\.)*")(?=,)|,((?1))(?=,)|,((?1))(?=})

For Perl 5.22 you would need to escape that { again so it would look something like this:

m/\{("(?:[^\\"]+|\\.)*")(?=,)|,((?1))(?=,)|,((?1))(?=})/g

This (?1) is a subroutine call to match the regex inside capturing group 1 which in this case is ("(?:[^\\"]+|\\.)*").

like image 54
Dean Taylor Avatar answered Nov 14 '22 22:11

Dean Taylor