Delete duplicate consecutive lines without sort or unique in xml file

Question

I have an xml file where I need to keep the order of the tags but have a tag called media that has duplicate lines in consecutive order. I would like to delete one of the duplicate media tags but want to preserve all of the parent tags - (which are also consecutive and repeat). I'm wondering if there is an awk solution to delete only if a pattern is matched. For example:

<story>
   <article>
      <media>One line</media>
      <media>One line</media>    <-- Same line as above, want to delete this
      <media>Another Line</media>
      <media>Another Line</media>  <-- Another duplicate, want to delete this
   </article>
</story>
<story>
   <article>
     ........ and so on

I want to keep the consecutive story and article tags and just delete duplicates for the media tag. I've tried a number of awk scripts but nothing seems to work without sorting the file and ruining the order of the xml. Any help much appreciated.

nu11p01n73R · Accepted Answer

An awk script would help you

awk '!(f == $0){print} {f=$0}' input

Test

$ cat input
<story>
   <article>
      <media>One line</media>
      <media>One line</media>
      <media>Another Line</media>
      <media>Another Line</media>
this
   </article>
</story>
<story>
   <article>

$ awk '!(f == $0){print} {f=$0}' input
<story>
   <article>
      <media>One line</media>
      <media>Another Line</media>
this
   </article>
</story>
<story>
   <article>

OR

$ awk 'f!=$0&&f=$0' input

Thanks to Jidder

Delete duplicate consecutive lines without sort or unique in xml file

Tags:

regex

bash

xml

sed

awk

user1625714

1 Answers

nu11p01n73R

Recent Activity

Donate For Us

Delete duplicate consecutive lines without sort or unique in xml file

Tags:

regex

bash

xml

sed

awk

user1625714

1 Answers

nu11p01n73R

Related questions

Recent Activity

Donate For Us