Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed - Back reference on match pattern does not work

I need to find in files (xml) date in this format 2021-06-25T21:17:51Z and replace them with this format 2021-06-25T21:17:51.001Z

I thought about using regexp with sed but back references does not work.

1.xml could look like this, but I have much more fields in those files, and I got fields already correct.

<Doc>
   <PUB_DATE>2021-06-25T21:17:51Z</PUB_DATE><!-- to change -->
   <DATE_COLLECT_100>2021-06-25T21:17:51Z</DATE_COLLECT_100><!-- to change -->

   <DATE_CREATION>2021-06-25T21:17:51.001Z</DATE_CREATION><!-- keep it like this -->
</Doc>

Desired output is

<Doc>
   <PUB_DATE>2021-06-25T21:17:51.001Z</PUB_DATE><!-- to change -->
   <DATE_COLLECT_100>2021-06-25T21:17:51.001Z</DATE_COLLECT_100><!-- to change -->

   <DATE_CREATION>2021-06-25T21:17:51.001Z</DATE_CREATION><!-- keep it like this -->
</Doc>

Here is my sed

$ sed -Ee 's#<(PUB_DATE|DATE_COLLECT_100){1}>([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2})Z</\1>#<\1>\2.001Z</\1>#' 1.xml

The regexep seems to be OK in regex101

Here a representation of it made with https://regexper.com representation of the regexp

Is back references allowed in sed when they are used in the search portion ? Am I missing something about sed ? Is there a bug ?

Sed version : well... I dont know, sed --version sed -v man sed doesn't give it. I'm on OSX.

like image 565
Patrick Ferreira Avatar asked Oct 18 '25 17:10

Patrick Ferreira


2 Answers

BSD or OSX sed doesn't support back-reference \1 in regex pattern.

Your choices are perl:

perl -pe 's#<(PUB_DATE|DATE_COLLECT_100)>(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})Z</\1>#<\1>\2.001Z</\1>#' 1.xml

Or else install gnu sed using home brew installer and then use:

gsed -E 's#<(PUB_DATE|DATE_COLLECT_100)>([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2})Z</\1>#<\1>\2.001Z</\1>#' 1.xml
like image 116
anubhava Avatar answered Oct 20 '25 12:10

anubhava


POSIX defines backreferences in a BRE, not an ERE, and you're calling sed with -E to enable EREs and so the result is undefined behavior per POSIX and so YMMV regarding what any given tool will do with that.

You don't need a script that complicated to handle the input you show though, e.g. using any sed that supports EREs with a -E arg (e.g. GNU and BSD sed):

$ sed -E 's/(<(PUB_DATE|DATE_COLLECT_100)>.*:[0-9]+)Z/\1.001Z/' file
<Doc>
   <PUB_DATE>2021-06-25T21:17:51.001Z</PUB_DATE><!-- to change -->
   <DATE_COLLECT_100>2021-06-25T21:17:51.001Z</DATE_COLLECT_100><!-- to change -->

   <DATE_CREATION>2021-06-25T21:17:51.001Z</DATE_CREATION><!-- keep it like this -->
</Doc>

and if your real input is more complicated/variable than that then you should be using an XML-aware tool such as xmlstarlet instead of sed anyway.

like image 27
Ed Morton Avatar answered Oct 20 '25 11:10

Ed Morton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!