Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add delimiters at specific indexes

Tags:

regex

sed

awk

I want to add a delimiter in some indexes for each line of a file.

I have a file with data:

10100100010000
20200200020000

And I know the offset of each column (2, 5 and 9)

With this sed command: sed 's/\(.\{2\}\)/&,/;s/\(.\{6\}\)/&,/;s/\(.\{11\}\)/&,/' myFile

I get the expected output:

10,100,1000,10000 
20,200,2000,20000

but with a large number of columns (~200) and rows (300k) is really slow.

Is there an efficient alternative?

like image 402
Circo Avatar asked Dec 05 '22 10:12

Circo


2 Answers

1st solution: With GNU awk could you please try following:

awk -v OFS="," '{$1=$1}1' FIELDWIDTHS="2 3 4 5"  Input_file

2nd Solution: Using sed try following.

sed 's/\(..\)\(...\)\(....\)\(.....\)/\1,\2,\3,\4/' Input_file

3rd solution: awk solution using substr.

awk 'BEGIN{OFS=","} {print substr($0,1,2) OFS substr($0,3,3) OFS substr($0,6,4) OFS substr($0,10,5)}' Input_file

In above substr solution, I have taken 5 digits/characters in substr($0,10,5) in case you want to take all characters/digits etc starting from 10th position use substr($0,10) which will take rest of all line's characters/digits here to print.

Output will be as follows.

10,100,1000,10000
20,200,2000,20000
like image 108
RavinderSingh13 Avatar answered Dec 17 '22 17:12

RavinderSingh13


Modifying your sed command to make it add all the separators in one shot would likely make it perform better :

sed 's/^\(.\{2\}\)\(.\{3\}\)\(.\{4\}\)/\1,\2,\3,/' myFile

Or with extended regular expression:

sed -E 's/(.{2})(.{3})(.{4})/\1,\2,\3,/' myFile

Output:

10,100,1000,10000
20,200,2000,20000
like image 45
Aaron Avatar answered Dec 17 '22 17:12

Aaron