I have a text file containing several function blocks and some of them are duplicates. I want to create a new file which contains only unique Function blocks. e.g. input.txt (I have updated the example):
Func (a1,b1) abc1
{
xyz1;
{
xy1;
}
xy1;
}
Func (a2,b2) abc2
{
xyz2;
{
xy2;
rst2;
}
xy2;
}
Func (a1,b1) abc1
{
xyz1;
{
xy1;
}
xy1;
}
Func (a3,b3) abc3
{
xyz3;
{
xy3;
rst3;
def3;
}
xy3;
}
Func (a1,b1) abc1
{
xyz1;
{
xy1;
}
xy1;
}
And want to have output.txt as:
Func (a1,b1) abc1
{
xyz1;
{
xy1;
}
xy1;
}
Func (a2,b2) abc2
{
xyz2;
{
xy2;
rst2;
}
xy2;
}
Func (a3,b3) abc3
{
xyz3;
{
xy3;
rst3;
def3;
}
xy3;
}
I found one solution using awk to remove duplicate line, something like:
$ awk '!a[$0]++' input.txt > output.txt
But the issue is that the above solution matches only single line not a text block. I wanted to combine this awk solution with the regex to match a single function block: '/^FUNC(.|\n)*?\n}/'
But I was not able to do that. Any suggestion/solution would be very helpful.
$ awk '$1=="Func"{ f=!seen[$NF]++ } f' file
Func (a1,b1) abc1
{
xyz1;
{
xy1;
}
xy1;
}
Func (a2,b2) abc2
{
xyz2;
{
xy2;
rst2;
}
xy2;
}
Func (a3,b3) abc3
{
xyz3;
{
xy3;
rst3;
def3;
}
xy3;
}
The above just assumes that every Func definition is on it's own line and that line ends with the function name.
All it does is look for a "Func" line and then set a flag f to true if this is the first time we've seen the function name at the end of the line and false otherwise (using the common awk idiom !seen[$NF]++ which you were already using in your question but named your array a[]). Then it prints the current line if f is true (i.e. you're following the Func definition of a previously unseen function name) and skips it otherwise (i.e. you're following the Func definition of a function name that had been seen previously).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With