I want to change every <code>.</code> to <code> @.@ </code> with sed, but only if the <code>.</code> is enclosed with numbers. For example: <pre class="prettyprint"><code>This sentence ends with a dot. 1.2.3 Dot. 1.2.3.4.5 Dot. </code></pre> The goal: <pre class="prettyprint"><code>This sentence ends with a dot. 1 @.@ 2 @.@ 3 Dot. 1 @.@ 2 @.@ 3 @.@ 4 @.@ 5 Dot. </code></pre> The pattern could contain any number of integers. I tried: <pre class="prettyprint"><code>sed -E 's/([0-9]+)\.([0-9]+)/\1 @\.@ \2/g' </code></pre> but it only works for the first two number in the pattern.

For the repeated pattern (number-dot-number-dot-number...) that substitution doesn't work because the number following the dot is "consumed" and so the engine moved along the string, so the next character it sees is a dot, not the needed num-dot-num pattern. One solution is to use lookarounds,&dagger; which are "zero-width" assertions, so with which the engine doesn't consume the match and doesn't move along, but it merely "looks" from its "spot" between characters to assert that the pattern (ahead or behind) matches, so to say <pre class="prettyprint"><code>s/ (?<=[0-9]) \. (?=[0-9]) / @.@ /gx; </code></pre> For a testable example (in Perl, as tagged) <pre class="prettyprint"><code>perl -wE'$_=q(Dot. 1.2.3.4.5 Dot.); say; s/(?<=[0-9])\.(?=[0-9])/ @.@ /g; say' </code></pre> which prints <pre class="prettyprint"> Dot. 1.2.3.4.5 Dot. Dot. 1 @.@ 2 @.@ 3 @.@ 4 @.@ 5 Dot. </pre> But the lookbehind won't work with a "number" that consists of more than one digit, since then we'd need <code>[0-9]+</code> which has variable and unlimited length, whiat lookbehinds can't (yet) do. If it is indeed possible to have multi-digit numbers in your case, then the number before the <code>.</code> need be captured -- this still works with the number before the dot -- and then put back <pre class="prettyprint"><code>s/([0-9]+)\.(?=[0-9])/$1 @.@ /g; </code></pre> This can be done anyway, of course, even if it's all always single digits; i used lookbehind originally only for the symmetry with the other side (needing a lookahead) <hr> &dagger; In a tool that supports them, which in my understanding <code>sed</code> isn't. (Thanks to comments by <code>potong</code> and <code>Ed Morton</code> for informing of that) I still offer this solution since Perl is one of the tagged languages.

As for the 1st line, the regex matches <code>1.2</code> for the 1st trial. The next pattern match starts with the character <code>.</code> just after the previous match then it fails. With <code>sed</code> please try: <pre class="prettyprint"><code>sed -E ' :l s/([[:digit:]])\.([[:digit:]])/\1 @.@ \2/ t l ' file </code></pre> which iterates the pattern match from the start of the string. As you are adding <code>perl</code> in the tag, here is an alternative with <code>perl</code>: <pre class="prettyprint"><code>perl -pe 's/(?<=\d)\.(?=\d)/ @.@ /g' file </code></pre>

Change any number of delimiters in found pattern with sed

Tags:

regex

sed

awk

perl

I want to change every . to @.@ with sed, but only if the . is enclosed with numbers.
For example:

This sentence ends with a dot. 1.2.3
Dot. 1.2.3.4.5 Dot.

The goal:

This sentence ends with a dot. 1 @.@ 2 @.@ 3
Dot. 1 @.@ 2 @.@ 3 @.@ 4 @.@ 5 Dot.

The pattern could contain any number of integers.

I tried:

sed -E 's/([0-9]+)\.([0-9]+)/\1 @\.@ \2/g'

but it only works for the first two number in the pattern.

631

asked Mar 01 '21 08:03

sedsed

Video Answer

2 Answers

For the repeated pattern (number-dot-number-dot-number...) that substitution doesn't work because the number following the dot is "consumed" and so the engine moved along the string, so the next character it sees is a dot, not the needed num-dot-num pattern.

One solution is to use lookarounds,^† which are "zero-width" assertions, so with which the engine doesn't consume the match and doesn't move along, but it merely "looks" from its "spot" between characters to assert that the pattern (ahead or behind) matches, so to say

s/ (?<=[0-9]) \. (?=[0-9]) / @.@ /gx;

For a testable example (in Perl, as tagged)

perl -wE'$_=q(Dot. 1.2.3.4.5 Dot.); say; s/(?<=[0-9])\.(?=[0-9])/ @.@ /g; say'

which prints

Dot. 1.2.3.4.5 Dot.
Dot. 1 @.@ 2 @.@ 3 @.@ 4 @.@ 5 Dot.

But the lookbehind won't work with a "number" that consists of more than one digit, since then we'd need [0-9]+ which has variable and unlimited length, whiat lookbehinds can't (yet) do.

If it is indeed possible to have multi-digit numbers in your case, then the number before the . need be captured -- this still works with the number before the dot -- and then put back

s/([0-9]+)\.(?=[0-9])/$1 @.@ /g;

This can be done anyway, of course, even if it's all always single digits; i used lookbehind originally only for the symmetry with the other side (needing a lookahead)

^† In a tool that supports them, which in my understanding sed isn't. (Thanks to comments by potong and Ed Morton for informing of that) I still offer this solution since Perl is one of the tagged languages.

answered Sep 22 '22 16:09

zdim

As for the 1st line, the regex matches 1.2 for the 1st trial. The next pattern match starts with the character . just after the previous match then it fails.
With sed please try:

sed -E '
:l
s/([[:digit:]])\.([[:digit:]])/\1 @.@ \2/
t l
' file

which iterates the pattern match from the start of the string.

As you are adding perl in the tag, here is an alternative with perl:

perl -pe 's/(?<=\d)\.(?=\d)/ @.@ /g' file

answered Sep 21 '22 16:09

tshiono

Related questions
                            
                                Validation hex and rgba colors using regex in php
                            
                                RegEx matching HTML tags and extracting text
                            
                                How can I ensure a Bash string is alphanumeric, without an underscore?
                            
                                Explanation of Lookaheads in This Regular Expression
                            
                                How to find recursion in your app?
                            
                                A regex to match a comma that isn't surrounded by quotes
                            
                                How to make a "minimal match" Regex search in C#? [duplicate]
                            
                                Regular Expression To Select An Entire Line That Contains a HTML Class
                            
                                RegExp.Test always returning false
                            
                                regex javascript - match multiple search terms ignoring their order
                            
                                Regular expression to match MySQL timestamp format "Y-M-D H:M:S"
                            
                                Can I mix character classes in Python RegEx?
                            
                                Html5 pattern attribute not matching for email([email protected])
                            
                                Matching one or another word in AngularJS ui-router
                            
                                How to add a variable into my re.compile expression
                            
                                javascript check if value has at least 2 or more words
                            
                                Python Title Case, but leave pre-existing uppercase
                            
                                Javascript regex Currency symbol in a string
                            
                                latest Perl won't match certain regexes more than 32768 characters long
                            
                                How to convert camel case to snake case with two capitals next to each other

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With