I have a file containing lines like <pre class="prettyprint"><code>I want a lot <*tag 1> more <*tag 2>*cheese *cakes. </code></pre> I am trying to remove the * within <code><></code> but not outside. The tags can be more complicated than above. For example, <code><*better *tag 1></code>. I tried <code>/\bregex\b/s/\*//g</code>, which works for tag 1 but not tag 2. So how can I make it work for tag 2 as well? Many thanks.

Obligatory Perl solution: <pre class="prettyprint"><code>perl -pe '$_ = join "", map +($i++ % 2 == 0 ? $_ : s/\*//gr), split /(<[^>]+>)/, $_;' FILE </code></pre> Append: <pre class="prettyprint"><code>perl -pe 's/(<[^>]+>)/$1 =~ s(\*)()gr/ge' FILE </code></pre>

Simple solution if you have only one asterisk in tag <pre class="prettyprint"><code>sed 's/<$[^>]*$\*$[^>]*$>/<\1\2>/g' </code></pre> If you can have more, you can use sed goto label system <pre class="prettyprint"><code>sed ':doagain s/<$[^>]*$\*$[^>]*$>/<\1\2>/g; t doagain' </code></pre> Where doagain is label for loop, t doagain is conditional jump to label doagain. Refer to the sed manual: <pre class="prettyprint"><code>t label Branch to label only if there has been a successful substitution since the last input line was read or conditional branch was taken. The label may be omitted, in which case the next cycle is started. </code></pre>

awk could solve your problem: <pre class="prettyprint"><code>awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file </code></pre> more readable version: <pre class="prettyprint"><code> awk '{x=split($0,a,/<[^>]*>/,s) for(i in s)gsub(/\*/,"",s[i]) for(j=1;j<=x;j++)r=r a[j] s[j] print r}' file </code></pre> test with your data: <pre class="prettyprint"><code>kent$ cat file I want a lot <*tag 1> more <*tag 2>*cheese *cakes. <*better *tag X*> kent$ awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file I want a lot <tag 1> more <tag 2>*cheese *cakes. <better tag X> </code></pre>

Replace/delete special characters within matched strings in sed

Tags:

regex

sed

I have a file containing lines like

I want a lot <*tag 1> more <*tag 2>*cheese *cakes.

I am trying to remove the * within <> but not outside. The tags can be more complicated than above. For example, <*better *tag 1>.

I tried /\bregex\b/s/\*//g, which works for tag 1 but not tag 2. So how can I make it work for tag 2 as well?

Many thanks.

503

asked May 30 '13 17:05

ToonZ

3 Answers

Obligatory Perl solution:

perl -pe '$_ = join "",
        map +($i++ % 2 == 0 ? $_ : s/\*//gr),
        split /(<[^>]+>)/, $_;' FILE

Append:

perl -pe 's/(<[^>]+>)/$1 =~ s(\*)()gr/ge' FILE

104

answered Nov 02 '22 23:11

bambams

Simple solution if you have only one asterisk in tag

sed 's/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g'

If you can have more, you can use sed goto label system

sed ':doagain s/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g; t doagain'

Where doagain is label for loop, t doagain is conditional jump to label doagain. Refer to the sed manual:

t label

 Branch to label only if there has been a successful substitution since the last 
 input line was read or conditional branch was taken. The label may be omitted, in 
 which case the next cycle is started.

answered Nov 02 '22 23:11

bartimar

awk could solve your problem:

awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file

Kent

Related questions
                            
                                Regex in java to find pattern like ${...} from given string
                            
                                Regex pattern to match positive and negative number values in a String
                            
                                Are there JavaScript equivalents of the Vim regular expression start and end of word atoms "\<" and "\>"?
                            
                                regular expression for c# verbatim like strings (processing ""-like escapes)
                            
                                Find and Replace All But Text Between Double Quotes in VS2010
                            
                                Use of findall and parenthesis in Python
                            
                                Regex: Split string on number/string?
                            
                                Regex to validate that a string contains only 0 - 9, +, #, *, [ and ]
                            
                                Bash - correct way to escape dollar in regex
                            
                                What are the differences between lazy, greedy and possessive quantifiers?
                            
                                Split using RegEx in JavaScript
                            
                                regex match on R gregexpr
                            
                                Why OrientDB doesn't use indexes for searching with "LIKE" operator?
                            
                                Using perl as a better grep to match multiple lines using single line mode m/RE/s
                            
                                Regular expression for conditionally formatting a number string
                            
                                C# Regex Pattern Conundrum
                            
                                Combine Multiple Regexp Patterns
                            
                                How to remove HTML markup from a body of text within a Google Spreadsheet?
                            
                                Java regular expression to validate numeric comma separated values
                            
                                Different MAC Addresses Regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With