How to match a keyword on a web page that is NOT within an <a> and its href, using JavaScript?

Tags:

I'm searching a page to find a specific keyword. That itself is easy enough. The added complication is that I don't want to match this keyword if it is part of an <a> tag.

E.g.

<p>Here is some example content that has a keyword in it. 
I want to match this keyword here but, i don't want to match 
the <a href="http://www.keyword.com">keyword</a> here.</p>

If you look at the above example content, the word 'keyword' appears 4 times. I want to match the first two times it appears with the paragraph, but i do not want to match it when it appears as part of the href and as part of the <a> content.

So far I've managed to use this below:

var tester = new RegExp("((?!<a.*?>)("+keyword+")(?!</a>))", 'ig');

The problem with that above is that it still matches the keyword if it is part of the href.

Any ideas? Thanks

902

asked Jan 25 '11 14:01

user589080

1 Answers

You can't reliably do this with JavaScript regexes. It's hard enough to do with the .NET regex engine that is one of the few to support infinite-length lookbehind assertions, but JavaScript doesn't know lookbehind assertions at all, so you can't look back to see what came before the text you do want to match.

So you should either use a DOM parser (I'm sure someone fluent in JavaScript can suggest a practical approach here), or read the text, remove all the <a> tags (which you sort of could do with a regex, if you're the brave type), and then search for your keyword in the rest of the text.

EDIT:

Well, there is a dirty hack that you could use. It's not pretty, and if you look at Alan Moore's comment to your question, you'll be able to imagine a multitude of ways in which this regex will fail, but it does work on your example:

/keyword(?!(?:(?!<a).)*</a)/

How does it "work"?

keyword    # Match "keyword"
(?!        # but only if it is not possible to match the following regex in the text ahead:
 (?:       # - Match...
  (?!<a)   # -- unless it's the start of an <a> tag...
  .        # -- any character
 )*        # - any number of times
 </a>      # then match a closing <a> tag. 
)          # End of lookahead assertion.

This is quite cryptic, even with the explanation. What it essentially does is:

Match "keyword"
Look ahead that there is no closing </a> in the following text
unless an opening <a> tag comes first.

So if all your <a> tags are correctly balanced, not nested, not found inside comments or script blocks, you might just get away with it.

answered Sep 25 '22 18:09

Tim Pietzcker

Related questions
                            
                                ObservableArray not reflecting data update
                            
                                Using 'Core Data' with Phonegap?
                            
                                What is the most common waste of computing power in Javascript?
                            
                                Best way to deal with the same div multiple places on the page
                            
                                Is it possible to initialize multiple Javascript arrays in a loop?
                            
                                Is It Possible to Create an iPhone App without Apple Products?
                            
                                How do you measure the height of a HTML element scaled with CSS3 transform?
                            
                                Where do I put the links to my Javascript/jQuery files in my html file?
                            
                                Jquery Templates performance
                            
                                Ruby on Rails escape_javascript usage with jQuery
                            
                                JS Object navigation-- when to use object.sub and object["sub"]?
                            
                                Understanding Javascript's difference between calling a function, and returning the function but executing it later
                            
                                stick element to top of page until next element of that type appears
                            
                                Creating a modular and organized javascript heavy website
                            
                                Is this the fastest way to parse my XML into JavaScript objects using jQuery?
                            
                                Overriding history.pushState leads to error in opera 11
                            
                                MongoDb aggregation $match error : "Arguments must be aggregate pipeline operators"
                            
                                Converting Gregorian date to Hijri date
                            
                                $watchGroup vs $watchCollection?
                            
                                AngularJS: ngTouch 300ms Delay

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to match a keyword on a web page that is NOT within an <a> and its href, using JavaScript?

Tags:

javascript

html

regex

user589080

People also ask

1 Answers

Tim Pietzcker

Recent Activity

Donate For Us