Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting domain of an URL with Regular Expressions

I'm trying to get the domain of a given URL. For example http://www.facebook.com/someuser/ will return facebook.com. The given URL can be on these formats:

  1. https://www.facebook.com/someuser (www. is optional, but should be ignored)
  2. www.facebook.com/someuser (http:// is not required)
  3. facebook.com/someuser
  4. http://someuser.tumblr.com -> this has to return tumblr.com only

I wrote this regex:

/(?: \.|\/{2})(?: www\.)?([^\/]*)/i

But it does not work as I expect.

I can do this in parts:

  1. Remove http:// and https://, if present on string, with string.delete "/https?:\/\//i".
  2. Remove www. with string.delete "/www\./i".
  3. Get the domain with match and /(\w+\.\w+)+/i

But this won't work with subdomains. String for testing:

https://www.facebook.com/username
http://last.fm/user/username
www.google.com
facebook.com/username
http://sub.tumblr.com/
sub.tumblr.com

I need this to work with the minimum memory and processing coast as possible.

Any ideas?

like image 633
Fábio Perez Avatar asked Jul 25 '11 22:07

Fábio Perez


1 Answers

Why don't you just use the URI class to do this?

URI.parse( your_uri ).host

And you're done.

Just one thing, if there's no "http://" or "https://" at the beginning of the url, you'll have to add one, or the parse method is not going to give you a host (it's going to be nil).

like image 79
Maurício Linhares Avatar answered Nov 10 '22 00:11

Maurício Linhares