I'm trying to figure out a way to get an array of URLs from a string of text. The text will be somewhat formatted like this:
Some random text up here
http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphones-bezel-a-massive-notification-light/?grcc=88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2=835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033fdeed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~
http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tickets-for-disrupt-sf/
Obviously, those links can be anything (and there can be many links, those are just the ones I'm testing with now. If I use a simple URL like my regex works fine.
I am using:
preg_match_all('((https?|ftp|gopher|telnet|file|notes|ms-help):'.
'((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)',
$bodyMessage, $matches, PREG_PATTERN_ORDER);
When I do a print_r( $matches);
the result I get is:
Array ( [0] => Array (
[0] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon=
[1] => http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick=
[2] => http://techcrunch.co=
[3] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-ip=
[4] => http://techcrunch.com/2012/07/20/last-day-to-purc=
[5] => http://tec=
)
...
None of those items in that array are full links from the links above.
Anyone know of a good way to get what I need? I've found a bunch of regex stuff to get links for PHP, but none of it works.
Thanks!
Edit:
Ok, so i'm pulling these links from an e-mail. The script parses the email, grabs the body of the message, and then tries to grab the links from that. After investigating the email, it appears as if it is for some reason adding a space in the middle of the url. Here is the output of the body message as seen by my PHP script.
--00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Any suggestions on how to make it not break the URLS?
EDIT 2
As per Laurnet's suggestion, I ran this code:
$bodyMessage = str_replace("= ", "",$bodyMessage);
However when I echo that out, it doesn't seem to want to replace "= "
--00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
/**
*
* @get URLs from string (string maybe a url)
*
* @param string $string
* @return array
*
*/
function getUrls($string) {
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
//return (array_reverse($matches[0]));
return ($matches[0]);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With