Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to parse URLs to extract the domain? [duplicate]

Possible Duplicate:
Ruby code to extract host from URL string

I found this module called URI which can parse the url. (I'm pretty new to ruby. Is 'module' synonymous with 'library' in this case?) You can then extract the host name.

uri = URI.parse("http://www.ruby-lang.org/")
...
p uri.host
# => "www.ruby-lang.org"

From this, I suppose you could remove 'www.' and keep other subdomains using regular expressions.

Does anyone have a more straight-forward alternative or does this approach seem right?

like image 800
Marc Ripley Avatar asked Feb 27 '11 12:02

Marc Ripley


People also ask

What is URL parsing?

URL Parsing. The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string.

What are the URL parse module method?

The url. parse() method takes a URL string, parses it, and it will return a URL object with each part of the address as properties. Parameters: This method accepts three parameters as mentioned above and described below: urlString: It holds the URL string which needs to parse.


1 Answers

So while posting my own answer, I'm not saying that gems like domainatrix or public_suffix_server aren't good elegant solutions (although the latter bugged on me immediately which caused me to go this route).

People suggesting using split() made me realize that I could just sub out 'www.' if it existed and otherwise leave domains as they are without installing gems and using 1 simple line of code:

url = request.original_url
domain = URI.parse(url).host.sub(/\Awww\./, '')

This works with subdomains and multi-part suffixes (eg. co.uk). Anybody see anything wrong with this?

EDIT: Thanks sorens for pointing out the weak regex I was originally using. This expression is certainly better.

like image 153
Marc Ripley Avatar answered Oct 07 '22 03:10

Marc Ripley