Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing an URL without a path but with a slash in the query

Tags:

url

php

I have problems parsing an URL than doesn't have a path but has a slash in the query. For example: http://example.com?q=a/b

I'm aware that such an URL is most likely invalid (*) - it requires at least a slash as the path like this: http://example.com/?q=a/b.

All browsers in which I tried such an URL in, correct the URL automatically. And that is basically what I want to reproduce: Identify and correct such an URL.

Using parse_url however produces:

var_dump( parse_url('http://example.com?q=a/b') );

array(3) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(15) "example.com?q=a"
  ["path"]=>
  string(2) "/b"
}

While with an URL without a slash in the query it works fine:

var_dump( parse_url('http://example.com?q=ab') );

array(3) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(11) "example.com"
  ["query"]=>
  string(4) "q=ab"
}

All external libraries I tried (Jwage\Purl, League\Url, Sabre\Uri) basically do the same thing, which surprises me a bit.

Why do (all?) browsers get it "right", while (all?) PHP libraries get it "wrong"?

Other than trying to catch these cases with a regular expression before parsing the URL (which may be unreliable - that's why I want to use a library in the first place), what alternatives do I have?

(*) I consulted three sources: RFC 1738, RFC 3986, WHATWG URL Standard and they all three disagree on what is considered valid.

like image 909
RoToRa Avatar asked Nov 27 '22 06:11

RoToRa


1 Answers

In case you still want to apply a regular expression, the following should generate the URL you are looking for:

$url=pcre_replace('/([^/]+:\/\/[^/]+)\?/', '$1/?',$url);

It requires for the URL to start with a protocol name of at least one character followed by "://", a domain name of at least one character ("localhost" would be acceptable too). After that it will insert '/' before a '?', but only if there is no further '/' before the '?'.

like image 77
Carsten Massmann Avatar answered Nov 28 '22 18:11

Carsten Massmann