Lets assume I have the following file text.txt
:
@something
@somethingelse
@anotherthing
1
2
2
3
3
3
4
4
4
5
5
6
7
7
8
9
9
9
10
11
11
11
14
15
I want to split this into multiple files by every 5th data row, but if the number of the next row is identical it should still end up in the same file. Header should be in every file, but that could also be ignored and reintroduced later.
This means something like this:
text.txt.1
@something
@somethingelse
@anotherthing
1
2
2
3
3
3
text.txt.2
@something
@somethingelse
@anotherthing
4
4
4
5
5
text.txt.3
@something
@somethingelse
@anotherthing
6
7
7
8
9
9
9
text.txt.4
@something
@somethingelse
@anotherthing
10
11
11
11
14
text.txt.5
@something
@somethingelse
@anotherthing
15
So I was thinking about something like this:
awk 'NR%5==1 && $1!=prev{i++;prev=$1}{print > FILENAME"."i}' test.txt
Both statements work by itself but not together.. is that possible using awk?
Nice question.
With your example, this would work:
awk 'BEGIN{i=1;}/\@/{header= header == ""? $0 : header "\n" $0; next}c>=5 && $1!=prev{i++;c=0;}{if(!c) print header>FILENAME"."i; print > FILENAME"."i;c++;prev=$1;}' test.txt
You need strip the header out, and set a counter (c
in above), NR
is just current line number of the input, it will not meet your needs when the actual lines are not times of 5.
Break it up and improve a tiny bit:
awk 'BEGIN{i=1;}
/\@/{header= header == ""? $0 : header ORS $0; next}
c>=5 && $1!=prev{i++;c=0;}
!c {print header>FILENAME"."i;}
{print > FILENAME"."i;c++;prev=$1;}
' test.txt
To solve the potential problems mentioned in the comment:
awk 'BEGIN{i=1}
/\@/{header= header == ""? $0 : header ORS $0; next}
c>=5 && $1!=prev{i++;c=0}
!c {close(f);f=(FILENAME"."i);print header>f}
{print>f;c++;prev=$1}
' test.txt
or check Ed's answer which is more precise and different platforms/versions compatible.
Using any awk in any shell on every Unix box:
$ cat tst.awk
/^@/ {
hdr = hdr $0 ORS
next
}
( (++numLines) % 5 ) == 1 {
if ( $0 == prev ) {
--numLines
}
else {
close(out)
out = FILENAME "." (++numBlocks)
printf "%s", hdr > out
numLines = 1
}
}
{
print > out
prev = $0
}
$ awk -f tst.awk text.txt
$ head text.txt.*
==> text.txt.1 <==
@something
@somethingelse
@anotherthing
1
2
2
3
3
3
==> text.txt.2 <==
@something
@somethingelse
@anotherthing
4
4
4
5
5
==> text.txt.3 <==
@something
@somethingelse
@anotherthing
6
7
7
8
9
9
9
==> text.txt.4 <==
@something
@somethingelse
@anotherthing
10
11
11
11
14
==> text.txt.5 <==
@something
@somethingelse
@anotherthing
15
With your shown samples, please try following awk
program. Written and tested in GNU awk
.
awk '
BEGIN{
outFile="test.txt"
count=1
}
/@/{
header=(header?header ORS:"")$0
next
}
{
arr[$0]=(arr[$0]?arr[$0] ORS:"")$0
}
END{
PROCINFO["sorted_in"] = "@ind_num_asc"
print header > (outFile count)
for(i in arr){
num=split(arr[i],arr2,"\n")
print arr[i] > (outFile count)
len+=num
if(len>=5){ len=0 }
if(len==0){
close(outFile count)
count++
print header > (outFile count)
}
}
}
' Input_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With