Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

need regex to match string that contains "http://" and ends with file extension from array

Tags:

regex

php

looking for a php solution that will find a match the following expression:

  1. URL contains "http://" (not necessarily begins with http://) AND
  2. URL ends with a file extension from an array.

Example of file extension array

$filetypes = array(
jpg,
gif,
png,
js,
tif,
pdf,
doc,
xls,
xlsx,
etc);

Here is the working code I wish to update with the above requirements:

Right now, this code works and returns only URL's that contain "http://" but i want to include the second requirement as well.

$i = 0;
$matches = false;
foreach($all_urls as $index => $value) {
    if (preg_match('/http:/', $value)) {
        $i++;
        echo "[{$i}] {$value}<br>";
        $matches = true;
    }
}
like image 768
user3436467 Avatar asked May 02 '15 13:05

user3436467


2 Answers

You can just do an in_array() call in your if statement where you check with pathinfo() if the extension is in the $filetypes array.

$i = 0;
$matches = false;
foreach($all_urls as $index => $value) {
    if (preg_match('/http:/', $value) && in_array(pathinfo($value, PATHINFO_EXTENSION ), $filetypes)) {
        $i++;
        echo "[{$i}] {$value}<br>";
        $matches = true;
    }
}

EDIT:

As you said in the comments that a few url's contains single quotes you can just use this to get rid of them as @Ghost showed it in the comments:

trim($value, "'")

Then use it in the in_array() call as followed:

in_array(pathinfo(trim($value, "'"), PATHINFO_EXTENSION ), $filetypes)
                //^^^^^^^^^^^^^^^^^
like image 113
Rizier123 Avatar answered Oct 03 '22 05:10

Rizier123


An easier solution would be using just a simple regex:

$i = 0;
$matches = false;
foreach($all_urls as $index => $value) {
    if (preg_match("/^http:\/\/.+\.(jpg|gif|png|js|tif|pdf|doc|xls|xlsx|etc)$/", $value)) {
        $i++;
        echo "[{$i}] {$value}<br>";
        $matches = true;
    }
}

This will ensure the match starts with http:// (due to the ^) and ends with the .jpg or likewise (due to the or'ed list and $).

If you want to support https you could just use:

/^https?:\/\/.+\.(jpg|gif|png|js|tif|pdf|doc|xls|xlsx|etc)$/
like image 32
Emil Ingerslev Avatar answered Oct 03 '22 06:10

Emil Ingerslev