I have some broken JSON files that I want to fix. The issue is that one of the fields, AcquisitionDateTime, is malformed:
{
"AcquisitionDateTime": 2016-04-28T17:09:39.515625,
}
What I want to do is wrap the value within parentheses. I can do that easily with a regex:
perl -pi -e 's/\"AcqDateTime\": (.*),/\"AcqDateTime\": \"\1\",/g' t.json
Now, I want to extend the regex so that, in case a JSON is not broken, the content doesn't get wrapped twice in "". The problem I'm facing is that I don't know how to mix the lookahead, the if/then statements and the capturing groups. Here's my attempt:
Lookahead, if you find a ", then capture what is between it. Else capture everything.
perl -pi -e 's/\"AcqDateTime\": (?(?=\")\"(.*)\"|(.*)),/\"AcqDateTime:\" \"\1\",/g' t.json
This is the part I'm interested in correcting:
Lookahead for a \" -> if yes, then capture without it. \"(.*)\" Else capture all (.*)
(?(?=\")\"(.*)\"|(.*)),
Would somebody explain to me what I'm doing wrong?
Thanks in advance.
A good start to match the time stamp would be
\S+
But that also matches the comma, so we switch to
[^\s,]+
Now, you want to avoid matching quotes too.
[^\s",]+
That's all you need.
perl -i -pe's/"AcqDateTime":\s*+\K([^\s",]+)/"$1"/g' t.json
The below regex includes a check on partial wrapping of quotes (i.e. only at the beginning or the end of the value), missing wrapping on both ends, or empty value:
perl -pi -e 's/\"AcqDateTime\": (|(?<!\")[^\"].*|.*[^\"](?!\")),/\"AcqDateTime\": \"\1\",/g' t.json
where (|(?<!\")[^\"].*|.*[^\"](?!\"))
includes:
{ "AcquisitionDateTime": }
or (?<!\")[^\"].*
: a value that doesn't start with a quote, as in { "AcquisitionDateTime": 2016" }
, or.*[^\"](?!\")
: a value that doesn't end with a quote, as in { "AcquisitionDateTime": "2016 }
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With