I'm trying to extract hashtags in an HTML text with the regular expression #([a-z0-9_]+), but with troubles in HTML attributes.
For example in the HTML text:
hola que tal with #hash1.
hola que tal with #hash2
y <a href="hola.que.tal#hash3"> para #hash4. </a>
I want to recover "hash1", "hash2" and "hash4" but not "hash3".
I tried to resolve it with lookarounds, with the following expression:
(?<!<)#([a-z0-9_]+)(?!.*?>)
but without success.
How I can do it with a single regular expression?
This should work
/#[a-z0-9_]+(?![^<]*>)/
See http://www.regexpal.com/?fam=95144
What the negative lookahead does is makes sure that there is a < between the hashtag and the next >.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With