Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing URLs in .NET

Tags:

c#

.net

url

uri

i'm looking for a .NET Framework class that can parse URLs.

Some examples of URL's that require parsing:

  • server:8088
  • server:8088/func1
  • server:8088/func1/SubFunc1
  • http://server
  • http://server/func1
  • http://server/func/SubFunc1
  • http://server:8088
  • http://server:8088/func1
  • http://server:8088/func1/SubFunc1
  • magnet://server
  • magnet://server/func1
  • magnet://server/func/SubFunc1
  • magnet://server:8088
  • magnet://server:8088/func1
  • magnet://server:8088/func1/SubFunc1

The problem is that the Uri and UriBuilder classes do not handle the URLs correctly. For example, they are confused by:

stackoverflow.com:8088

Background on Urls

The format of a Url is:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \_________/ \__/\_________/\__________/ \__/
   |         |        |     |           |        |
scheme      host    port   path       query   fragment

In our case, we only care about:

  • Uri.Scheme
  • Uri.Host
  • Uri.Port
  • Uri.Path

Tests

Running some tests, we can check how UriBuilder class handles various Uri's:

                                        Expected  Expected Expected    Expected
//Test URI                               Scheme    Server    Port        Path
//=====================================  ========  ========  ====  ====================
t("server",                              "",       "server", -1,   "");
t("server/func1",                        "",       "server", -1,   "/func1");
t("server/func1/SubFunc1",               "",       "server", -1,   "/func1/SubFunc1");
t("server:8088",                         "",       "server", 8088, "");
t("server:8088/func1",                   "",       "server", 8088, "/func1");
t("server:8088/func1/SubFunc1",          "",       "server", 8088, "/func1/SubFunc1");
t("http://server",                       "http",   "server", -1,   "/func1");
t("http://server/func1",                 "http",   "server", -1,   "/func1");
t("http://server/func/SubFunc1",         "http",   "server", -1,   "/func1/SubFunc1");
t("http://server:8088",                  "http",   "server", 8088, "");
t("http://server:8088/func1",            "http",   "server", 8088, "/func1");
t("http://server:8088/func1/SubFunc1",   "http",   "server", 8088, "/func1/SubFunc1");
t("magnet://server",                     "magnet", "server", -1,   "");
t("magnet://server/func1",               "magnet", "server", -1,   "/func1");
t("magnet://server/func/SubFunc1",       "magnet", "server", -1,   "/func/SubFunc1");
t("magnet://server:8088",                "magnet", "server", 8088, "");
t("magnet://server:8088/func1",          "magnet", "server", 8088, "/func1");
t("magnet://server:8088/func1/SubFunc1", "magnet", "server", 8088, "/func1/SubFunc1");

All but six cases fail to parse correctly:

Url                                  Scheme  Host    Port  Path
===================================  ======  ======  ====  ===============
server                               http    server  80    /
server/func1                         http    server  80    /func1
server/func1/SubFunc1                http    server  80    /func1/SubFunc1
server:8088                          server          -1    8088
server:8088/func1                    server          -1    8088/func1
server:8088/func1/SubFunc1           server          -1    8088/func1/SubFunc1
http://server                        http    server  80    /
http://server/func1                  http    server  80    /func1
http://server/func/SubFunc1          http    server  80    /func1/SubFunc1
http://server:8088                   http    server  8088  /
http://server:8088/func1             http    server  8088  /func1
http://server:8088/func1/SubFunc1    http    server  8088  /func1/SubFunc1
magnet://server                      magnet  server  -1    /
magnet://server/func1                magnet  server  -1    /func1
magnet://server/func/SubFunc1        magnet  server  -1    /func/SubFunc1
magnet://server:8088                 magnet  server  8088  /
magnet://server:8088/func1           magnet  server  8088  /func1
magnet://server:8088/func1/SubFunc1  magnet  server  8088  /func1/SubFunc1

i said i wanted a .NET Framework class. i would also accept any code-gum laying around that i can pick up and chew. As long as it satisfies my simplistic test cases.

Bonus Chatter

i was looking at expanding this question, but that question is limited to http only.

i also asked this same question earlier today, but i realize now that i phrased it incorrectly. i incorrectly asked how to "build" a url. When in reality i want to "parse" a user-entered URL. i can't go back and fundamentally change the title now. So i'll ask the same question again, only better, with more clearly stated goals, here.

Bonus Reading

  • How can I parse HTTP urls in C#?
  • How to build a Url?
like image 975
Ian Boyd Avatar asked Nov 11 '22 17:11

Ian Boyd


1 Answers

Will this regular expression do?

^((?<schema>[a-z]*)://)?(?<host>[^/:]*)?(:(?<port>[0-9]*))?(?<path>/.*)?$

It's not perfect, but it seems to work for your test cases.

like image 53
Luaan Avatar answered Nov 15 '22 00:11

Luaan