Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete duplicate lines only if they match a pattern

This question has a great answer saying you can use awk '!seen[$0]++' file.txt to delete non-consecutive duplicate lines from a file. How can I delete non-consecutive duplicate lines from a file only if they match a pattern? e.g. only if they contain the string "#####"

Example input

deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'

Desired output

deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'
like image 970
IceCreamToucan Avatar asked Mar 03 '19 18:03

IceCreamToucan


1 Answers

You may use

awk '!/#####/ || !seen[$0]++'

Or, as Ed Morton suggests, a synonymical

awk '!(/#####/ && seen[$0]++)'

Here, !seen[$0]++ does the same thing as usual, it will remove any duplicated line. The !/#####/ part matches lines that contain a ##### pattern and negates the match. The two patterns combined with || will remove all duplicate lines having ##### pattern inside them.

See an online awk demo:

s="deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223  #####:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'"
awk '!/#####/ || !seen[$0]++' <<< "$s"

Output:

deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223  #####:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'
like image 172
Wiktor Stribiżew Avatar answered Oct 01 '22 01:10

Wiktor Stribiżew