I have this HTML:
"This is simple html text <span class='simple'>simple simple text text</span> text"
I need to match only words that are outside any HTML tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text”—the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using jQuery.
var pattern = new RegExp("(\\b" + value + "\\b)", 'gi');
if (pattern.test(text)) {
text = text.replace(pattern, "<span class='notranslate'>$1</span>");
}
value
is the word I want to match (in this case “simple”)text
is "This is simple html text <span class='simple'>simple simple text text</span> text"
I need to wrap all selected words (in this example it is “simple”) with <span>
. But I want to wrap only words that are outside any HTML tags. The result of this example should be
This is <span class='notranslate'>simple</span> html <span class='notranslate'>text</span> <span class='simple'>simple simple text text</span> <span class='notranslate'>text</span>
I do not want replace any text inside
<span class='simple'>simple simple text text</span>
It should be the same as before replacement.
Find and replace text using regular expressions When you want to search and replace specific patterns of text, use regular expressions. They can help you in pattern matching, parsing, filtering of results, and so on. Once you learn the regex syntax, you can use it for almost any language.
To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring. The string 3foobar4 matches the regex /\d. *\d/ , so it is replaced.
The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match if any of the following conditions is true: If the replacement string cannot readily be specified by a regular expression replacement pattern.
While arbitrary HTML with only a regex is impossible, it's sometimes appropriate to use them for parsing a limited, known set of HTML. If you have a small set of HTML pages that you want to scrape data from and then stuff into a database, regexes might work fine.
Okay, try using this regex:
(text|simple)(?![^<]*>|[^<>]*</)
Example worked on regex101.
Breakdown:
( # Open capture group
text # Match 'text'
| # Or
simple # Match 'simple'
) # End capture group
(?! # Negative lookahead start (will cause match to fail if contents match)
[^<]* # Any number of non-'<' characters
> # A > character
| # Or
[^<>]* # Any number of non-'<' and non-'>' characters
</ # The characters < and /
) # End negative lookahead.
The negative lookahead will prevent a match if text
or simple
is between html tags.
^([^<]*)<\w+.*/\w+>([^<]*)$
However this is a very naive expression. It would be better to use a DOM parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With