Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shell script numbering lines in a file

Tags:

shell

sed

awk

I need to find a faster way to number lines in a file in a specific way using tools like awk and sed. I need the first character on each line to be numbered in this fashion: 1,2,3,1,2,3,1,2,3 etc.

For example, if the input was this:

line 1
line 2
line 3
line 4
line 5
line 6
line 7

The output needs to look like this:

1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

Here is a chunk of what I have. $lines is the number of lines in the data file divided by 3. So for a file of 21000 lines I process this loop 7000 times.

export i=0
while [ $i -le $lines ]
do
    export start=`expr $i \* 3 + 1`
    export end=`expr $start + 2`
    awk NR==$start,NR==$end $1 | awk '{printf("%d%s\n", NR,$0)}' >> data.out
    export i=`expr $i + 1`
done

Basically this grabs 3 lines at a time, numbers them, and adds to an output file. It's slow...and then some! I don't know of another, faster, way to do this...any thoughts?

like image 215
Douglas Anderson Avatar asked Nov 28 '22 05:11

Douglas Anderson


2 Answers

Try the nl command.

See https://linux.die.net/man/1/nl (or another link to the documentation that comes up when you Google for "man nl" or the text version that comes up when you run man nl at a shell prompt).

The nl utility reads lines from the named file or the standard input if the file argument is ommitted, applies a configurable line numbering filter operation and writes the result to the standard output.

edit: No, that's wrong, my apologies. The nl command doesn't have an option for restarting the numbering every n lines, it only has an option for restarting the numbering after it finds a pattern. I'll make this answer a community wiki answer because it might help someone to know about nl.

like image 93
2 revs Avatar answered Nov 29 '22 17:11

2 revs


It's slow because you are reading the same lines over and over. Also, you are starting up an awk process only to shut it down and start another one. Better to do the whole thing in one shot:

awk '{print ((NR-1)%3)+1 $0}' $1 > data.out

If you prefer to have a space after the number:

awk '{print ((NR-1)%3)+1, $0}' $1 > data.out
like image 21
Jon Ericson Avatar answered Nov 29 '22 17:11

Jon Ericson