Getting domain of an URL with Regular Expressions

Question

I'm trying to get the domain of a given URL. For example http://www.facebook.com/someuser/ will return facebook.com. The given URL can be on these formats:

https://www.facebook.com/someuser (www. is optional, but should be ignored)
www.facebook.com/someuser (http:// is not required)
facebook.com/someuser
http://someuser.tumblr.com -> this has to return tumblr.com only

I wrote this regex:

/(?: \.|\/{2})(?: www\.)?([^\/]*)/i

But it does not work as I expect.

I can do this in parts:

Remove http:// and https://, if present on string, with string.delete "/https?:\/\//i".
Remove www. with string.delete "/www\./i".
Get the domain with match and /(\w+\.\w+)+/i

But this won't work with subdomains. String for testing:

https://www.facebook.com/username
http://last.fm/user/username
www.google.com
facebook.com/username
http://sub.tumblr.com/
sub.tumblr.com

I need this to work with the minimum memory and processing coast as possible.

Any ideas?

Maurício Linhares · Accepted Answer

Why don't you just use the URI class to do this?

URI.parse( your_uri ).host

And you're done.

Just one thing, if there's no "http://" or "https://" at the beginning of the url, you'll have to add one, or the parse method is not going to give you a host (it's going to be nil).

Getting domain of an URL with Regular Expressions

Tags:

string

regex

url

parsing

ruby

Fábio Perez

1 Answers

Maurício Linhares

Recent Activity

Donate For Us

Getting domain of an URL with Regular Expressions

Tags:

string

regex

url

parsing

ruby

Fábio Perez

1 Answers

Maurício Linhares

Related questions

Recent Activity

Donate For Us