Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to match word (url) only if it does not contain character

I'm using an API that sometimes truncates links inside the text that it returns and instead of "longtexthere https://fancy.link" I get "longtexthere https://fa…".

I'm trying to get to match the link only if it's complete, or in other words does not contain "…" character.

So far I am able to get links by using the following regex:

((?:https?:)?\/\/\S+\/?)

but obviously it returns every link including broken ones.

I've tried to do something like this:

((?:https?:)?\/\/(?:(?!…)\S)+\/?)

Although that started to ignore the "…" character it was still returning the link but just without including the character, so with the case of "https://fa…" it returned "https://fa" whereas I simply want it to ignore that broken link and move on.

Been fighting this for hours and just can't get my head around it. :(

Thanks for any help in advance.

like image 387
kiradotee Avatar asked Apr 01 '16 13:04

kiradotee


3 Answers

You can use

(?:https?:)?\/\/[^\s…]++(?!…)\/?

See the regex demo. The possessive quantifier [^\s…]++ will match all non-whitespace and non- characters without later backtracking and then check if the next character is not . If it is, no match will be found.

As an alternative, if your regex engine allow possessive quantifiers, use a negative lookahead version:

(?!\S+…)(?:https?:)?\/\/\S+\/?

See another regex demo. The lookahead (?!\S+…) will fail the match if 1+ non-whitespace characters are followed with .

like image 177
Wiktor Stribiżew Avatar answered Oct 18 '22 16:10

Wiktor Stribiżew


You can try following regex

https?:\/\/\w+(?:\.\w+\/?)+(?!\.{3})(\s|$)

See demo https://regex101.com/r/bS6tT5/3

like image 42
Saleem Avatar answered Oct 18 '22 15:10

Saleem


Try:

 ((?:https?:)?\/\/\S+[^ \.]{3}\/?)

Its the same as your original pattern.. you just tell it that the last three characters should not be '.' (period) or ' ' (space)

UPDATE: Your second link worked.

and if you tweak your regex just slightly it will do what you want:

 ((?:https?:)?\/\/\S+[^ …] \/?)

Yes it looks just like what you had in there except I added a ' ' (space) after the part we do not want.. this will force the regular expression to match up until and including the space which it cannot with a url that has the '...' character. Without the space at the end it would match up until the not including the '...' which was why it was not doing what we wanted ;)

like image 45
Rob Avatar answered Oct 18 '22 14:10

Rob