Remove duplicate function block using 'awk'/Python (Generic solution)

Question

I have a text file containing several function blocks and some of them are duplicates. I want to create a new file which contains only unique Function blocks. e.g. input.txt (I have updated the example):

Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a2,b2) abc2
{
xyz2;
    {
        xy2;
        rst2;
    }

xy2;
}

Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a3,b3) abc3
{
xyz3;
    {
        xy3;
        rst3;
        def3;
    }

xy3;
}
    Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

And want to have output.txt as:

Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a2,b2) abc2
{
xyz2;
    {
        xy2;
        rst2;
    }

xy2;
}

Func (a3,b3) abc3
{
xyz3;
    {
        xy3;
        rst3;
        def3;
    }

xy3;
}

I found one solution using awk to remove duplicate line, something like:

$ awk '!a[$0]++' input.txt > output.txt

But the issue is that the above solution matches only single line not a text block. I wanted to combine this awk solution with the regex to match a single function block: '/^FUNC(.| )*? }/'

But I was not able to do that. Any suggestion/solution would be very helpful.

Ed Morton · Accepted Answer

$ awk '$1=="Func"{ f=!seen[$NF]++ } f' file
Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a2,b2) abc2
{
xyz2;
    {
        xy2;
        rst2;
    }

xy2;
}

Func (a3,b3) abc3
{
xyz3;
    {
        xy3;
        rst3;
        def3;
    }

xy3;
}

The above just assumes that every Func definition is on it's own line and that line ends with the function name.

All it does is look for a "Func" line and then set a flag f to true if this is the first time we've seen the function name at the end of the line and false otherwise (using the common awk idiom !seen[$NF]++ which you were already using in your question but named your array a[]). Then it prints the current line if f is true (i.e. you're following the Func definition of a previously unseen function name) and skips it otherwise (i.e. you're following the Func definition of a function name that had been seen previously).

Remove duplicate function block using 'awk'/Python (Generic solution)

Tags:

python

regex

bash

awk

tanzil

1 Answers

Ed Morton

Recent Activity

Donate For Us

Remove duplicate function block using 'awk'/Python (Generic solution)

Tags:

python

regex

bash

awk

tanzil

1 Answers

Ed Morton

Related questions

Recent Activity

Donate For Us