I have a string separated by dot in Linux Shell, <pre class="prettyprint"><code>$example=This.is.My.String </code></pre> I want to 1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get: <pre class="prettyprint"><code>This.is.My.Goood.Long.String </code></pre> 2.Get the part after the last dot, so I will get <pre class="prettyprint"><code>String </code></pre> 3.Turn the dot into underscore except the last dot, so I will get <pre class="prettyprint"><code>This_is_My.String </code></pre> If you have time, please explain a little bit, I am still learning Regular Expression. Thanks a lot!

I don't know what you mean by 'Linux Shell' so I will assume <code>bash</code>. This solution will also work in <code>zsh</code>, etcetera: <pre class="prettyprint"><code>example=This.is.My.String before_last_dot=${example%.*} after_last_dot=${example##*.} echo ${before_last_dot}.Goood.Long.${after_last_dot} This.is.My.Goood.Long.String echo ${before_last_dot//./_}.${after_last_dot} This_is_My.String </code></pre> The interim variables <code>before_last_dot</code> and <code>after_last_dot</code> should explain my usage of the <code>%</code> and <code>##</code> operators. The <code>//</code>, I also think is self-explanatory but I'd be happy to clarify if you have any questions. This doesn't use <code>sed</code> (or even regular expressions), but <code>bash</code>'s inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)

Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots: <pre class="prettyprint"><code>sed 's/$.*$\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//' </code></pre> <ol> <li> It splits the line before the last dot by inserting a newline and copies the result into hold space: <pre class="prettyprint"><code>s/$.*$\./\1\n./;h </code></pre> </li> <li> removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space: <pre class="prettyprint"><code>s/[^\n]*\n//;x </code></pre> </li> <li> removes everything after and including the newline from the copy that's now in pattern space <pre class="prettyprint"><code>s/\n.*// </code></pre> </li> <li> changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space <pre class="prettyprint"><code>s/\./_/g;G </code></pre> </li> <li> removes the newline that the append operation adds <pre class="prettyprint"><code>s/\n// </code></pre> </li> </ol> Then the <code>sed</code> script is finished and the pattern space is output. At the end of each numbered step (some consist of two actual steps): Step Pattern Space Hold Space <ol> <li> This.is.My<code>\n</code>.String This.is.My<code>\n</code>.String</li> <li> This.is.My<code>\n</code>.String .String</li> <li> This.is.My .String</li> <li> This_is_My<code>\n</code>.String .String</li> <li> This_is_My.String .String</li> </ol>

<h3>Solution</h3> <ol> <li>Two versions of this, too: <ul> <li>Complex: <code>sed 's/$.*$$[.][^.]*$$/\1.Goood.Long\2/'</code> </li> <li>Simple: <code>sed 's/.*\./&Goood.Long./'</code> - thanks Dennis Williamson </li> </ul> </li> <li>What do you want? <ul> <li>Complex: <code>sed 's/.*[.]$[^.]*$$/\1/'</code> </li> <li>Simpler: <code>sed 's/.*\.//'</code> - thanks, glenn jackman.</li> </ul> </li> <li><code>sed 's/$[^.]*$[.]$[^.]*[.]$/\1_\2/g'</code></li> </ol> With 3, you probably need to run the substitute (in its entirety) at least twice, in general. <h3>Explanation</h3> Remember, in <code>sed</code>, the notation <code>$...$</code> is a 'capture' that can be referenced as '<code>\1</code>' or similar in the replacement text. <ol> <li>Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.</li> <li>Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.</li> <li>Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.</li> </ol>

A Linux Shell Script Problem

Tags:

string

linux

shell

sed

I have a string separated by dot in Linux Shell,

$example=This.is.My.String

I want to

1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:

This.is.My.Goood.Long.String

2.Get the part after the last dot, so I will get

String

3.Turn the dot into underscore except the last dot, so I will get

This_is_My.String

If you have time, please explain a little bit, I am still learning Regular Expression.

Thanks a lot!

663

asked Nov 09 '10 20:11

DocWiki

3 Answers

I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:

example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot} 
This.is.My.Goood.Long.String

echo ${before_last_dot//./_}.${after_last_dot} 
This_is_My.String

The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.

This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)

196

answered Sep 23 '22 23:09

Johnsyweb

Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:

sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'

It splits the line before the last dot by inserting a newline and copies the result into hold space:
```
s/$.*$\./\1\n./;h
```
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
```
s/[^\n]*\n//;x
```
removes everything after and including the newline from the copy that's now in pattern space
```
s/\n.*//
```
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
```
s/\./_/g;G
```
removes the newline that the append operation adds
```
s/\n//
```

Then the sed script is finished and the pattern space is output.

At the end of each numbered step (some consist of two actual steps):

Step Pattern Space Hold Space

This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String

answered Sep 24 '22 23:09

Dennis Williamson

Solution

Two versions of this, too:
- Complex: sed 's/$.*$$[.][^.]*$$/\1.Goood.Long\2/'
- Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
- Complex: sed 's/.*[.]$[^.]*$$/\1/'
- Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/$[^.]*$[.]$[^.]*[.]$/\1_\2/g'

With 3, you probably need to run the substitute (in its entirety) at least twice, in general.

Explanation

Remember, in sed, the notation $...$ is a 'capture' that can be referenced as '\1' or similar in the replacement text.

Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log₂N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.