(I put a exact text and command I executed so would be looking a bit messy.) I have a .TXT file looking like <pre class="prettyprint"><code>11111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111 </code></pre> And outcome I am looking for would be like <pre class="prettyprint"><code>11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111 11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111 </code></pre> Command I have tried is <pre class="prettyprint"><code>sed -i 's/$.\{14\}$$.\{7\}$$.\{2\}$$.\{1\}$$.\{3\}$$.\{13\}$$.\{1\}$$.\{8\}$$.\{16\}$$.\{3\}$/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,/' SOME.TXT </code></pre> And outcome I have got was <pre class="prettyprint"><code>11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111 11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111 </code></pre> I have literally no idea why these 0s suddenly popped out and ' , ' doesn't appear in the position where I command even though it worked half way. Is this a bug or something in sed command?

It is printing <code>0</code> in output because <code>sed</code> capture groups and their back-references can be up to 9 only and <code>\10</code> is interpreted as <code>\1</code> followed by literal <code>0</code>. You can solve it easily using <code>FIELDWIDTHS</code> feature of <code>gnu-awk</code>: <pre class="prettyprint"><code>awk -v OFS=, 'BEGIN { FIELDWIDTHS = "14 7 2 1 3 13 1 8 16 3 *" } {$1 = $1} 1' file </code></pre> <pre class="prettyprint"><code>11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111 11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111 </code></pre> <hr> Just for academic exercise, here is a working <code>sed</code> to solve this using 2 substitutions: <pre class="prettyprint"><code>sed -E 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.+)/\1,\2,\3,\4,\5,\6,\7,\8,\9/; s/(.+,.{16})(.{3})(.*)/\1,\2,\3/' file </code></pre>

sed can't reference capture groups > 9, Perl can: <pre class="prettyprint"><code>perl -i -pe 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT </code></pre>

If you insist to use <code>sed</code>, you can do something like: <pre class="prettyprint lang-sh prettyprint-override"><code>sed 's/./&,/68;s/./&,/65;s/./&,/49;s/./&,/41;s/./&,/40;s/./&,/27;s/./&,/24;s/./&,/23;s/./&,/21;s/./&,/14' test.txt 11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111 11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111 </code></pre>

Inserting a "," in a particular position of a text

Tags:

regex

sed

awk

(I put a exact text and command I executed so would be looking a bit messy.)

I have a .TXT file looking like

11111111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111

And outcome I am looking for would be like

11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

Command I have tried is

sed -i 's/\(.\{14\}\)\(.\{7\}\)\(.\{2\}\)\(.\{1\}\)\(.\{3\}\)\(.\{13\}\)\(.\{1\}\)\(.\{8\}\)\(.\{16\}\)\(.\{3\}\)/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,/' SOME.TXT

And outcome I have got was

11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111

I have literally no idea why these 0s suddenly popped out and ' , ' doesn't appear in the position where I command even though it worked half way.

Is this a bug or something in sed command?

498

asked Aug 11 '20 10:08

gggert

3 Answers

It is printing 0 in output because sed capture groups and their back-references can be up to 9 only and \10 is interpreted as \1 followed by literal 0.

You can solve it easily using FIELDWIDTHS feature of gnu-awk:

awk -v OFS=, 'BEGIN { FIELDWIDTHS = "14 7 2 1 3 13 1 8 16 3 *" } {$1 = $1} 1' file

11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

Just for academic exercise, here is a working sed to solve this using 2 substitutions:

sed -E 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.+)/\1,\2,\3,\4,\5,\6,\7,\8,\9/; s/(.+,.{16})(.{3})(.*)/\1,\2,\3/' file

196

answered Nov 30 '22 02:11

anubhava

sed can't reference capture groups > 9, Perl can:

perl -i -pe  's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT

answered Nov 30 '22 01:11

choroba

If you insist to use sed, you can do something like:

sed 's/./&,/68;s/./&,/65;s/./&,/49;s/./&,/41;s/./&,/40;s/./&,/27;s/./&,/24;s/./&,/23;s/./&,/21;s/./&,/14' test.txt
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

answered Nov 30 '22 02:11

Maroun

Related questions
                            
                                Replace all non-word characters with a space
                            
                                Regular Expression to allow french text as well as english text?
                            
                                Multiline Regex in PowerShell
                            
                                Is there any Python implementation of logstash's grok functionality?
                            
                                Django - How can i modify text before save them to database?
                            
                                Grep for a string that ends with specific character
                            
                                Regex to determine if string is a single repeating character [duplicate]
                            
                                Restricting Character length in Regular expression
                            
                                Looping grepl() through data.table (R)
                            
                                Splitting a string based on a certain set of words
                            
                                How to evaluate a when condition for Ansible task
                            
                                Regex - match a character and all its diacritic variations (aka accent-insensitive)
                            
                                How to search in PHP Array, similar to MySQL Like %var% search
                            
                                Python regex get group position
                            
                                Recursive replaceAll java [duplicate]
                            
                                powershell extract text between two strings
                            
                                Open vim file with cursor on first search pattern match (similar to line no. with vim +N file)
                            
                                Extract only folder name right before filename from full path
                            
                                Get the directory name (not a path) of a given file path in Golang
                            
                                Split character string by forward slash or nothing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With