I want to add a delimiter in some indexes for each line of a file.
I have a file with data:
10100100010000
20200200020000
And I know the offset of each column (2, 5 and 9)
With this sed command: sed 's/\(.\{2\}\)/&,/;s/\(.\{6\}\)/&,/;s/\(.\{11\}\)/&,/' myFile
I get the expected output:
10,100,1000,10000
20,200,2000,20000
but with a large number of columns (~200) and rows (300k) is really slow.
Is there an efficient alternative?
1st solution: With GNU awk
could you please try following:
awk -v OFS="," '{$1=$1}1' FIELDWIDTHS="2 3 4 5" Input_file
2nd Solution: Using sed
try following.
sed 's/\(..\)\(...\)\(....\)\(.....\)/\1,\2,\3,\4/' Input_file
3rd solution: awk
solution using substr
.
awk 'BEGIN{OFS=","} {print substr($0,1,2) OFS substr($0,3,3) OFS substr($0,6,4) OFS substr($0,10,5)}' Input_file
In above substr
solution, I have taken 5 digits/characters in substr($0,10,5)
in case you want to take all characters/digits etc starting from 10th position use substr($0,10)
which will take rest of all line's characters/digits here to print.
Output will be as follows.
10,100,1000,10000
20,200,2000,20000
Modifying your sed command to make it add all the separators in one shot would likely make it perform better :
sed 's/^\(.\{2\}\)\(.\{3\}\)\(.\{4\}\)/\1,\2,\3,/' myFile
Or with extended regular expression:
sed -E 's/(.{2})(.{3})(.{4})/\1,\2,\3,/' myFile
Output:
10,100,1000,10000
20,200,2000,20000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With