Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

POSIX: abcdef to ab bc cd de ef

Tags:

shell

posix

sed

awk

Using POSIX sed or awk, I would like to duplicate every second character in every pair of neighboring characters and list every newly-formed pair on a new line.

example.txt:

abcd 10001.

Expected result:

ab
bc
cd
d 
 1
10
00
00
01
1.

So far, this is what I have (N.B. omit "--posix" if on macOS). For some reason, adding a literal newline character before \2 does not produce the expected result. Removing the first group and using \1 has the same effect. What am I missing?

sed --posix -E -e 's/(.)(.)/&\2\
/g' example.txt

abb
cdd
100
000
1..
like image 332
octosquidopus Avatar asked Jul 26 '20 16:07

octosquidopus


2 Answers

You may use

sed --posix -e 's/./&\
&/g' example.txt | sed '1d;$d'

The first sed command finds every char in the string and replaces with the same char, then a newline and then the same char again. Since it replaces first and last chars, the first and last resulting lines must be removed, which is achieved with sed '1d;$d'.

Had sed supported lookarounds, one could have used (?!^).(?!$) (any char but not at the start or end of string) and the last sed command would not have been necessary, but it is not possible with sed. You could use it in perl though, perl -pe 's/(?!^).(?!$)/$&\n$&/g' example.txt (see demo online, $& in the RHS is the same as & placeholder in sed, the whole match value).

like image 105
Wiktor Stribiżew Avatar answered Nov 07 '22 04:11

Wiktor Stribiżew


Try:

$ echo "abcd 10001." | awk '{for(i=1;i<length($0);i++) print substr($0,i,2)}'
ab
bc
cd
d 
 1
10
00
00
01
1.
like image 11
dawg Avatar answered Nov 07 '22 06:11

dawg