Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get domain name (not subdomain) in php

I have a URL which can be any of the following formats:

http://example.com https://example.com http://example.com/foo http://example.com/foo/bar www.example.com example.com foo.example.com www.foo.example.com foo.bar.example.com http://foo.bar.example.com/foo/bar example.net/foo/bar 

Essentially, I need to be able to match any normal URL. How can I extract example.com (or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?

like image 787
Cyclone Avatar asked Apr 21 '10 00:04

Cyclone


2 Answers

Well you can use parse_url to get the host:

$info = parse_url($url); $host = $info['host']; 

Then, you can do some fancy stuff to get only the TLD and the Host

$host_names = explode(".", $host); $bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1]; 

Not very elegant, but should work.


If you want an explanation, here it goes:

First we grab everything between the scheme (http://, etc), by using parse_url's capabilities to... well.... parse URL's. :)

Then we take the host name, and separate it into an array based on where the periods fall, so test.world.hello.myname would become:

array("test", "world", "hello", "myname"); 

After that, we take the number of elements in the array (4).

Then, we subtract 2 from it to get the second to last string (the hostname, or example, in your example)

Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD

Then we combine those two parts with a period, and you have your base host name.

like image 80
Tyler Carter Avatar answered Sep 24 '22 03:09

Tyler Carter


My solution in https://gist.github.com/pocesar/5366899

and the tests are here http://codepad.viper-7.com/GAh1tP

It works with any TLD, and hideous subdomain patterns (up to 3 subdomains).

There's a test included with many domain names.

Won't paste the function here because of the weird indentation for code in StackOverflow (could have fenced code blocks like github)

like image 43
pocesar Avatar answered Sep 22 '22 03:09

pocesar