I have an xml file where I need to keep the order of the tags but have a tag called media that has duplicate lines in consecutive order. I would like to delete one of the duplicate media tags but want to preserve all of the parent tags - (which are also consecutive and repeat). I'm wondering if there is an awk solution to delete only if a pattern is matched. For example:
<story>
<article>
<media>One line</media>
<media>One line</media> <-- Same line as above, want to delete this
<media>Another Line</media>
<media>Another Line</media> <-- Another duplicate, want to delete this
</article>
</story>
<story>
<article>
........ and so on
I want to keep the consecutive story and article tags and just delete duplicates for the media tag. I've tried a number of awk scripts but nothing seems to work without sorting the file and ruining the order of the xml. Any help much appreciated.
An awk script would help you
awk '!(f == $0){print} {f=$0}' input
Test
$ cat input
<story>
<article>
<media>One line</media>
<media>One line</media>
<media>Another Line</media>
<media>Another Line</media>
this
</article>
</story>
<story>
<article>
$ awk '!(f == $0){print} {f=$0}' input
<story>
<article>
<media>One line</media>
<media>Another Line</media>
this
</article>
</story>
<story>
<article>
OR
$ awk 'f!=$0&&f=$0' input
Thanks to Jidder
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With