Splits a hostname into subdomains, domain and (effective) top-level domains. Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains.
Simply put, a domain name (or just 'domain') is the name of a website. It's what comes after “@” in an email address, or after “www.” in a web address. If someone asks how to find you online, what you tell them is usually your domain name.
Method 1: In this method, we will use createElement() method to create a HTML element, anchor tag and then use it for parsing the given URL. Method 2: In this method we will use URL() to create a new URL object and then use it for parsing the provided URL.
Check out parse_url()
:
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'
parse_url
doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.
$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
This would return the google.com
for both http://google.com/... and http://www.google.com/...
From http://us3.php.net/manual/en/function.parse-url.php#93983
for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I've written a quick function to get the real host:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
}
getHost("example.com"); // Gives example.com
getHost("http://example.com"); // Gives example.com
getHost("www.example.com"); // Gives www.example.com
getHost("http://example.com/xyz"); // Gives example.com
function get_domain($url = SITE_URL)
{
preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
return $_domain_tld[0];
}
get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr
The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.
function domain($url)
{
global $subtlds;
$slds = "";
$url = strtolower($url);
$host = parse_url('http://'.$url,PHP_URL_HOST);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub){
if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
}
return @$matches[0];
}
function get_tlds() {
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
$content = file($address);
foreach ($content as $num => $line) {
$line = trim($line);
if($line == '') continue;
if(@substr($line[0], 0, 2) == '/') continue;
$line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(@$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
), $subtlds);
$subtlds = array_unique($subtlds);
return $subtlds;
}
Then use it like
$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr
I know I should have turned this into a class, but didn't have time.
If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html
, usage of parse_url() is acceptable solution for you.
But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.
I recomend TLDExtract for domain parsing, here is sample code that show diff:
$extract = new LayerShifter\TLDExtract\Extract();
# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return google.com
$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'
# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With