Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Twitter regex only when not already a link

Tags:

regex

php

I know this has been done to death already. I've found lots of topics on the subject already and have taken lots of advice. However if I have the following string:

@testaccount
<a href="http://twitter.com/testaccount">@testaccount</a>

Obviously, I don't want to convert the second one to a link as it already is one. I've managed to find the first one without it being an email (thanks to several questions already here).

Here is the pattern I've got already:

/(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)/

That will convert the first one perfectly, but the second one will obviously become a 'double link'.

So I managed to work out that I should use something like this (?!<\/a>). However, that only removes the last t of testaccount.

Essentially, I need to find a way to ignore the whole match rather than just remove one character. Is this possible?

Language I'm using is PHP.

Thanks

like image 259
CircularRecursion Avatar asked Feb 01 '26 21:02

CircularRecursion


2 Answers

You could make effective use of (*SKIP) and (*FAIL) backtracking control verbs.

~<a[^<]*</a>(*SKIP)(*F)|@(\w+)~

The idea is to skip any content that is located between <a .. tags. On the left side of the alternation operator we match the subpattern we do not want, making it fail and forcing the regex engine to not retry the substring.

Live Demo

like image 56
hwnd Avatar answered Feb 04 '26 11:02

hwnd


You need to add .*? before <\/a> inside that negative lookahead. So that it won't match @ strings which are already anchored.

(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z0-9_]+)(?!.*?<\/a>)

DEMO

like image 30
Avinash Raj Avatar answered Feb 04 '26 12:02

Avinash Raj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!