input.txt file
12345678,Manoj,23,Developer
12345678,Manoj,34,Developer
12345678,Manoj,67,Developer
12345679,Vijay,12,Tester
12345679,Vijay,98,Tester
12345676,Samrat,100,Manager
12345676,Samrat,25,Manager
12345676,Samrat,28,Manager
Desired output file
12345678,Manoj,23,Developer,0
12345678,Manoj,34,Developer,1
12345678,Manoj,67,Developer,2
12345679,Vijay,12,Tester,0
12345679,Vijay,98,Tester,1
12345676,Samrat,100,Manager,0
12345676,Samrat,25,Manager,1
12345676,Samrat,28,Manager,2
Explanation
Here the first value i.e 12345678 in the first 3 lines of my input file are the same so append the first 3 lines with ,0 ,1 and ,2 respectively. And similarly to the following lines.
How it can be done in Shell Script.
Edit in Desired Output
Is is also possible to change the Desired Output number format to the following for the output?
12345678,Manoj,23,Developer,0000000
12345678,Manoj,34,Developer,0000001
12345678,Manoj,67,Developer,0000002
12345679,Vijay,12,Tester,0000000
12345679,Vijay,98,Tester,0000001
12345676,Samrat,100,Manager,0000000
12345676,Samrat,25,Manager,0000001
12345676,Samrat,28,Manager,0000002
New: Is it possible to start the numbering from 0000019. Is there anyother option to initialize a variable like a=5, a=19, a=39 from where i can increment afterwards.
12345678,Manoj,23,Developer,0000019
12345678,Manoj,34,Developer,0000020
12345678,Manoj,67,Developer,0000021
12345679,Vijay,12,Tester,0000019
12345679,Vijay,98,Tester,0000020
12345676,Samrat,100,Manager,0000019
12345676,Samrat,25,Manager,0000020
12345676,Samrat,28,Manager,0000021
Using awk:
$ awk 'BEGIN{FS=OFS=",";RS="\r?\n"}{print $0,a[$1]++}' file
Output:
12345678,Manoj,23,Developer,0
12345678,Manoj,34,Developer,1
12345678,Manoj,67,Developer,2
12345679,Vijay,12,Tester,0
12345679,Vijay,98,Tester,1
12345676,Samrat,100,Manager,0
12345676,Samrat,25,Manager,1
12345676,Samrat,28,Manager,2
Edit:
As the requirements changed and a lot of commenting took place, here is the final version (revision one as the requirements were different in comments and the OP, knocking on wood):
$ awk 'BEGIN{FS=","}{sub(/\r$/,"");printf "%s,%07d" ORS,$0,a[$1]++}' file
Explained:
$ awk '
BEGIN {
FS=","
# ORS="\r\n" # uncomment if Windows line-endings are desired
}
{
sub(/\r$/,"") # remove Windows line-endings (ie. \r from \r\n)
printf "%s,%07d" ORS,$0,a[$1]++ # output zeropadded running count on $1
}' file
Tested with gawk, mawk, busybox awk and the original-awk (awk version 20121220). Oh, and recycled my Solaris box 5 years ago. ;D
Update to fix my former self-unknown line-ending error.
Use this, will work on both \r\n and \n line endings, output will end in \n:
awk -F, 'sub(/\r$/,"") ($(NF+1)=sprintf("%07d",a[$2]++))' OFS=, input.txt
Output:
12345678,Manoj,23,Developer,0000000
12345678,Manoj,34,Developer,0000001
12345678,Manoj,67,Developer,0000002
12345679,Vijay,12,Tester,0000000
12345679,Vijay,98,Tester,0000001
12345676,Samrat,100,Manager,0000000
12345676,Samrat,25,Manager,0000001
12345676,Samrat,28,Manager,0000002
I wrote like that is for conciseness, it's functionally equals to:
awk 'BEGIN{FS=OFS=","}{sub(/\r$/,"");$(NF+1)=sprintf("%07d",a[$2]++)}1' input.txt
If you have ruby installed:
ruby -aF, -pe 'BEGIN{a=Hash.new(-1)};sub(/\r?$/, "," + "%07d" % a[$F[1]]+=1)' input.txt
Same output.
Btw, if you want it starts with 19, you can use this (add 19+ to the value):
awk 'sub(/\r$/,"") ($(NF+1)=sprintf("%07d",19+a[$2]++))' FS=, OFS=, input.txt
Or this(initialize with 18):
ruby -aF, -pe 'BEGIN{a=Hash.new(18)};sub(/\r?$/, "," + "%07d" % a[$F[1]]+=1)' input.txt
These all used $2 (column 2) as the keys, since in your samples $1 and $2 are related, so use either one would work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With