Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Could you share a link to an URL parsing implementation? [closed]

As far as I understand, an URL consists of the folowing fields:

  • Protocol (http, https, ftp, etc.)
  • User name
  • User Password
  • Host address (an IP address or a DNS FQDN)
  • Port (which can be implied)
  • Path to a document inside the server documents root
  • Set of arguments and values
  • Document part (#)

as

protocol://user:password@host:port/path/document?arg1=val1&arg2=val2#part

I need a code to get value (or null/empty value if not set) of any of these fields from any given URL string. Am I to implement this myself or there is already a code for this so I don't need to invent a wheel?

I am particularly interested in Scala or Java code. C#, PHP, Python or Perl code can also be useful.

like image 665
Ivan Avatar asked Oct 21 '10 18:10

Ivan


People also ask

What does parsing a URL do?

URL Parsing. The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string.

How do you parse a link?

Method 1: In this method, we will use createElement() method to create a HTML element, anchor tag and then use it for parsing the given URL. Method 2: In this method we will use URL() to create a new URL object and then use it for parsing the provided URL.

What are the methods in the URL class used for parsing the URL?

The URL class provides several methods that let you query URL objects. You can get the protocol, authority, host name, port number, path, query, filename, and reference from a URL using these accessor methods: getProtocol. Returns the protocol identifier component of the URL.


2 Answers

The URL class gives you everything you need. See http://download.oracle.com/javase/6/docs/api/java/net/URL.html

URL url = new URL("protocol://user:password@host:port/path/document?arg1=val1&arg2=val2#part");
url.getProtocol();
url.getUserInfo();
url.getAuthority();
url.getHost();
url.getPort();
url.getPath(); // document part is contained within the path field
url.getQuery();
url.getRef(); // gets #part
like image 162
Codemwnci Avatar answered Oct 25 '22 09:10

Codemwnci


Use the java.net.URI class for this. URLs are for real resources and real protocols. URIs are for possibly non-existent protocols and resources.

like image 32
user207421 Avatar answered Oct 25 '22 09:10

user207421