Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate function block using 'awk'/Python (Generic solution)

I have a text file containing several function blocks and some of them are duplicates. I want to create a new file which contains only unique Function blocks. e.g. input.txt (I have updated the example):

Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a2,b2) abc2
{
xyz2;
    {
        xy2;
        rst2;
    }

xy2;
}

Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a3,b3) abc3
{
xyz3;
    {
        xy3;
        rst3;
        def3;
    }

xy3;
}
    Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

And want to have output.txt as:

Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a2,b2) abc2
{
xyz2;
    {
        xy2;
        rst2;
    }

xy2;
}

Func (a3,b3) abc3
{
xyz3;
    {
        xy3;
        rst3;
        def3;
    }

xy3;
}

I found one solution using awk to remove duplicate line, something like:

$ awk '!a[$0]++' input.txt > output.txt

But the issue is that the above solution matches only single line not a text block. I wanted to combine this awk solution with the regex to match a single function block: '/^FUNC(.|\n)*?\n}/'

But I was not able to do that. Any suggestion/solution would be very helpful.

like image 561
tanzil Avatar asked Mar 26 '26 02:03

tanzil


1 Answers

$ awk '$1=="Func"{ f=!seen[$NF]++ } f' file
Func (a1,b1) abc1
{
xyz1;
    {
        xy1;
    }

xy1;
}

Func (a2,b2) abc2
{
xyz2;
    {
        xy2;
        rst2;
    }

xy2;
}

Func (a3,b3) abc3
{
xyz3;
    {
        xy3;
        rst3;
        def3;
    }

xy3;
}

The above just assumes that every Func definition is on it's own line and that line ends with the function name.

All it does is look for a "Func" line and then set a flag f to true if this is the first time we've seen the function name at the end of the line and false otherwise (using the common awk idiom !seen[$NF]++ which you were already using in your question but named your array a[]). Then it prints the current line if f is true (i.e. you're following the Func definition of a previously unseen function name) and skips it otherwise (i.e. you're following the Func definition of a function name that had been seen previously).

like image 95
Ed Morton Avatar answered Mar 27 '26 16:03

Ed Morton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!