Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Check Whether A URL Is External URL or Internal URL With PHP?

Tags:

html

php

backend

I'm getting all ahrefs of a page with this loop:

foreach($html->find('a[href!="#"]') as $ahref) {
    $ahrefs++;
}

I want to do something like this:

foreach($html->find('a[href!="#"]') as $ahref) {
    if(isexternal($ahref)) {
        $external++;
    }
    $ahrefs++;
}

Where isexternal is a function

function isexternal($url) {
    // FOO...

    // Test if link is internal/external
    if(/*condition is true*/) {
        return true;
    }
    else {
        return false;
    }
}

Help!

like image 894
mehulmpt Avatar asked Apr 09 '14 13:04

mehulmpt


2 Answers

Use parse_url and compare host to your local host (often but not always it's the same as $_SERVER['HTTP_HOST'])

function isexternal($url) {
  $components = parse_url($url);    
  return !empty($components['host']) && strcasecmp($components['host'], 'example.com'); // empty host will indicate url like '/relative.php'
}

Hovewer this will treat www.example.com and example.com as different hosts. If you want all your subdomains to be treated as local links then the function will be somewhat larger:

function isexternal($url) {
  $components = parse_url($url);
  if ( empty($components['host']) ) return false;  // we will treat url like '/relative.php' as relative
  if ( strcasecmp($components['host'], 'example.com') === 0 ) return false; // url host looks exactly like the local host
  return strrpos(strtolower($components['host']), '.example.com') !== strlen($components['host']) - strlen('.example.com'); // check if the url host is a subdomain
}
like image 85
Ruslan Bes Avatar answered Sep 19 '22 17:09

Ruslan Bes


This is how you can simply detect external URLs:

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';

$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    stripos( $url, '.' . $domain ) ||            // include subdomains, like "www.my-domain.com". DANGEROUS (see below)!
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

The above check will treat www.my-domain.com and my-domain.com as being "internal".

Why this rule is dangerous:

The subdomain logic introduces a weakness that could be exploited: When an external URL contains your domain inside the path, for example, https://external.com/www.my-domain.com is treated as internal!

More secure code:

This problem can be eliminated by removing subdomain support (which I suggest to do):

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';

$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);
like image 20
Philipp Avatar answered Sep 20 '22 17:09

Philipp