Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best ways of parsing a URL using C?

Tags:

c

url

parsing

I have a URL like this:

http://192.168.0.1:8080/servlet/rece 

I want to parse the URL to get the values:

IP: 192.168.0.1 Port: 8080 page:  /servlet/rece 

How do I do that?

like image 926
Jiang Bian Avatar asked Apr 07 '09 14:04

Jiang Bian


People also ask

How is a URL parsed?

The URL class provides several methods that let you query URL objects. You can get the protocol, authority, host name, port number, path, query, filename, and reference from a URL using these accessor methods: getProtocol.

What is parsing in C?

To parse, in computer science, is where a string of commands – usually a program – is separated into more easily processed components, which are analyzed for correct syntax and then attached to tags that define each component. The computer can then process each program chunk and transform it into machine language.

What are the URL parse module method?

The url. parse() method takes a URL string, parses it, and it will return a URL object with each part of the address as properties. Parameters: This method accepts three parameters as mentioned above and described below: urlString: It holds the URL string which needs to parse.

What does parsing a URL mean?

URL Parsing. The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string.


2 Answers

Personally, I steal the HTParse.c module from the W3C (it is used in the lynx Web browser, for instance). Then, you can do things like:

 strncpy(hostname, HTParse(url, "", PARSE_HOST), size) 

The important thing about using a well-established and debugged library is that you do not fall into the typical traps of URL parsing (many regexps fail when the host is an IP address, for instance, specially an IPv6 one).

like image 192
bortzmeyer Avatar answered Oct 01 '22 20:10

bortzmeyer


I wrote a simple code using sscanf, which can parse very basic URLs.

#include <stdio.h>  int main(void) {     const char text[] = "http://192.168.0.2:8888/servlet/rece";     char ip[100];     int port = 80;     char page[100];     sscanf(text, "http://%99[^:]:%99d/%99[^\n]", ip, &port, page);     printf("ip = \"%s\"\n", ip);     printf("port = \"%d\"\n", port);     printf("page = \"%s\"\n", page);     return 0; }  ./urlparse ip = "192.168.0.2" port = "8888" page = "servlet/rece" 
like image 43
Jiang Bian Avatar answered Oct 01 '22 18:10

Jiang Bian