I'm searching a page to find a specific keyword. That itself is easy enough. The added complication is that I don't want to match this keyword if it is part of an <a>
tag.
E.g.
<p>Here is some example content that has a keyword in it.
I want to match this keyword here but, i don't want to match
the <a href="http://www.keyword.com">keyword</a> here.</p>
If you look at the above example content, the word 'keyword' appears 4 times. I want to match the first two times it appears with the paragraph, but i do not want to match it when it appears as part of the href
and as part of the <a>
content.
So far I've managed to use this below:
var tester = new RegExp("((?!<a.*?>)("+keyword+")(?!</a>))", 'ig');
The problem with that above is that it still matches the keyword if it is part of the href
.
Any ideas? Thanks
Use the #id selector from another page You can also jump to a specific part of another web page by adding #selector to the page's URL.
Put the title into an opening HTML anchor link tag After you name the section you'd like to link, insert it into an opening HTML anchor link tag. Adding this tag creates an anchor link, which leads users to the specified section of your webpage.
To make page links in an HTML page, use the <a> and </a> tags, which are the tags used to define the links. The <a> tag indicates where the link starts and the </a> tag indicates where it ends. Whatever text gets added inside these tags, will work as a link. Add the URL for the link in the <a href=” ”>.
Definition and UsageThe href attribute specifies the URL of the page the link goes to. If the href attribute is not present, the <a> tag will not be a hyperlink. Tip: You can use href="#top" or href="#" to link to the top of the current page!
You can't reliably do this with JavaScript regexes. It's hard enough to do with the .NET regex engine that is one of the few to support infinite-length lookbehind assertions, but JavaScript doesn't know lookbehind assertions at all, so you can't look back to see what came before the text you do want to match.
So you should either use a DOM parser (I'm sure someone fluent in JavaScript can suggest a practical approach here), or read the text, remove all the <a>
tags (which you sort of could do with a regex, if you're the brave type), and then search for your keyword in the rest of the text.
EDIT:
Well, there is a dirty hack that you could use. It's not pretty, and if you look at Alan Moore's comment to your question, you'll be able to imagine a multitude of ways in which this regex will fail, but it does work on your example:
/keyword(?!(?:(?!<a).)*</a)/
How does it "work"?
keyword # Match "keyword"
(?! # but only if it is not possible to match the following regex in the text ahead:
(?: # - Match...
(?!<a) # -- unless it's the start of an <a> tag...
. # -- any character
)* # - any number of times
</a> # then match a closing <a> tag.
) # End of lookahead assertion.
This is quite cryptic, even with the explanation. What it essentially does is:
</a>
in the following text<a>
tag comes first.So if all your <a>
tags are correctly balanced, not nested, not found inside comments or script blocks, you might just get away with it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With