Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse a URL and extract the required substring

Tags:

parsing

ruby

Say I have a string like this: "http://something.example.com/directory/"

What I want to do is to parse this string, and extract the "something" from the string.

The first step, is to obviously check to make sure that the string contains "http://" - otherwise, it should ignore the string.

But, how do I then just extract the "something" in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://").

Thanks.

P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://" but that doesn't solve the full problem because that will produce "http://something.example.com/directory/". All I want is the "something", nothing else.

like image 236
marcamillion Avatar asked Nov 06 '12 01:11

marcamillion


People also ask

How do I find the substring of a URL?

Substring matches URLs that contains a set of characters that you specify. This set of characters is called a substring and can be present anywhere in the URL. You can access this targeting by selecting "A set of URLs" under Page Type.

What is URL parsing?

URL Parsing. The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string.

How would you extract the URL in Java?

In Java, this can be done by using Pattern. matcher(). Find the substring from the first index of match result to the last index of the match result and add this substring into the list. After completing the above steps, if the list is found to be empty, then print “-1” as there is no URL present in the string S.

What is Uri parse in android?

It is an immutable one-to-one mapping to a resource or data. The method Uri. parse creates a new Uri object from a properly formated String .


1 Answers

I'd do it this way:

require 'uri'

uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"

URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.

like image 153
the Tin Man Avatar answered Dec 27 '22 05:12

the Tin Man