Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex for extracting all urls from string

Tags:

regex

php

I'm trying to extract URL from a piece of string I have different posts that contains URL in their message. I've prepared a pattern to match but it's not working properly.

Tried Regex

$pattern1= '%\b((https?://)|(www\.)|(^[\D]+\.))[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))%';
$pattern2= '%\b^((https?://)|(www\.)|(^[a-z]+\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%';

CODE

for ( $i = 0; $i < $resultcount; $i ++ ) {
    $pattern = '%\b^((https?://)|(www\.)|(^[a-z]+\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%';
    $message = (string)$result[$i]['message'];
    preg_match_all($pattern,$message,$match);
    print_r($match);
    }

A Example of my post like this

"This is just a post to test regex for extracting URL http://google.com, https://www.youtube.com/watch?v=dlw32af https://instagram.com/oscar/ en.wikipedia.org"

Post may have comma or may not have comma for multiple URLs

Thank you people :)

like image 974
Mr. Pyramid Avatar asked Mar 08 '23 04:03

Mr. Pyramid


1 Answers

This should get you started:

\b(?:https?://)?(?:(?i:[a-z]+\.)+)[^\s,]+\b


Broken down, this says:
\b                   # a word boundary
(?:https?://)?       # http:// or https://, optional
(?:(?i:[a-z]+\.)+)   # any subdomain before
[^\s,]+              # neither whitespace nor comma
\b                   # another word boundary

See a demo on regex101.com.

like image 144
Jan Avatar answered Mar 10 '23 10:03

Jan