Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex replace text outside html tags

I have this HTML:

"This is simple html text <span class='simple'>simple simple text text</span> text"

I need to match only words that are outside any HTML tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text”—the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using jQuery.

var pattern = new RegExp("(\\b" + value + "\\b)", 'gi');

if (pattern.test(text)) {
    text = text.replace(pattern, "<span class='notranslate'>$1</span>");
}
  • value is the word I want to match (in this case “simple”)
  • text is "This is simple html text <span class='simple'>simple simple text text</span> text"

I need to wrap all selected words (in this example it is “simple”) with <span>. But I want to wrap only words that are outside any HTML tags. The result of this example should be

This is <span class='notranslate'>simple</span> html <span class='notranslate'>text</span> <span class='simple'>simple simple text text</span> <span class='notranslate'>text</span>

I do not want replace any text inside

<span class='simple'>simple simple text text</span>

It should be the same as before replacement.

like image 393
Sanya530 Avatar asked Sep 04 '13 18:09

Sanya530


People also ask

Can you replace text with regex?

Find and replace text using regular expressions When you want to search and replace specific patterns of text, use regular expressions. They can help you in pattern matching, parsing, filtering of results, and so on. Once you learn the regex syntax, you can use it for almost any language.

How do you replace a word in regex?

To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring. The string 3foobar4 matches the regex /\d. *\d/ , so it is replaced.

What is regex in replace?

The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match if any of the following conditions is true: If the replacement string cannot readily be specified by a regular expression replacement pattern.

Can you use regex in a HTML document?

While arbitrary HTML with only a regex is impossible, it's sometimes appropriate to use them for parsing a limited, known set of HTML. If you have a small set of HTML pages that you want to scrape data from and then stuff into a database, regexes might work fine.


2 Answers

Okay, try using this regex:

(text|simple)(?![^<]*>|[^<>]*</)

Example worked on regex101.

Breakdown:

(         # Open capture group
  text    # Match 'text'
|         # Or
  simple  # Match 'simple'
)         # End capture group
(?!       # Negative lookahead start (will cause match to fail if contents match)
  [^<]*   # Any number of non-'<' characters
  >       # A > character
|         # Or
  [^<>]*  # Any number of non-'<' and non-'>' characters
  </      # The characters < and /
)         # End negative lookahead.

The negative lookahead will prevent a match if text or simple is between html tags.

like image 80
Jerry Avatar answered Oct 11 '22 09:10

Jerry


^([^<]*)<\w+.*/\w+>([^<]*)$

However this is a very naive expression. It would be better to use a DOM parser.

like image 29
Explosion Pills Avatar answered Oct 11 '22 10:10

Explosion Pills