Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex lookahead non capturing with if/then

Tags:

json

regex

perl

I have some broken JSON files that I want to fix. The issue is that one of the fields, AcquisitionDateTime, is malformed:

{
    "AcquisitionDateTime": 2016-04-28T17:09:39.515625,
}

What I want to do is wrap the value within parentheses. I can do that easily with a regex:

perl -pi -e 's/\"AcqDateTime\": (.*),/\"AcqDateTime\": \"\1\",/g' t.json

Now, I want to extend the regex so that, in case a JSON is not broken, the content doesn't get wrapped twice in "". The problem I'm facing is that I don't know how to mix the lookahead, the if/then statements and the capturing groups. Here's my attempt:

Lookahead, if you find a ", then capture what is between it. Else capture everything.
perl -pi -e 's/\"AcqDateTime\": (?(?=\")\"(.*)\"|(.*)),/\"AcqDateTime:\" \"\1\",/g' t.json

This is the part I'm interested in correcting:

Lookahead for a \"  -> if yes, then capture without it. \"(.*)\" Else capture all (.*)
(?(?=\")\"(.*)\"|(.*)),

Would somebody explain to me what I'm doing wrong?

Thanks in advance.

like image 346
dangom Avatar asked Dec 21 '16 16:12

dangom


2 Answers

A good start to match the time stamp would be

\S+

But that also matches the comma, so we switch to

 [^\s,]+

Now, you want to avoid matching quotes too.

 [^\s",]+

That's all you need.

perl -i -pe's/"AcqDateTime":\s*+\K([^\s",]+)/"$1"/g' t.json
like image 100
ikegami Avatar answered Sep 20 '22 09:09

ikegami


The below regex includes a check on partial wrapping of quotes (i.e. only at the beginning or the end of the value), missing wrapping on both ends, or empty value:

perl -pi -e 's/\"AcqDateTime\": (|(?<!\")[^\"].*|.*[^\"](?!\")),/\"AcqDateTime\": \"\1\",/g' t.json

where (|(?<!\")[^\"].*|.*[^\"](?!\")) includes:

  • an empty string value, as in the case of { "AcquisitionDateTime": } or
  • (?<!\")[^\"].*: a value that doesn't start with a quote, as in { "AcquisitionDateTime": 2016" }, or
  • .*[^\"](?!\"): a value that doesn't end with a quote, as in { "AcquisitionDateTime": "2016 }.
like image 33
M A Avatar answered Sep 22 '22 09:09

M A