in PHP, I've written a proxy function that accepts a url, user agent, and other settings. Then the function makes a curl request for the website, and prints out that output with proper html content type headers into an iframe (this is necessary only because of my need to change some headers).
That proxied output often has lots of assets with relative URLS and actually inheret the hostname of my site, not the proxied site:
example: [http://MYSITE.com/proxy?url=http://somesite.com] would return the html of [http://somesite.com]
in the response html, there is stuff like this:
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="assets/ico/apple-touch-icon-144-precomposed.png">
The problem:
Instead of the asset looking for that asset at http://somesite.com/assets/ico/apple-touch-icon-144-precomposed.png
, it actually tries to find it at http://MYSITE.com/assets/ico/apple-touch-icon-144-precomposed.png
which is wrong.
The Question:
What do i need to do to get their relative-path assets to load properly via the proxy?
Either check the URL is relative to file location box, or edit the URL to remove the drive letter. Such a link will work only on your computer. If you upload a file containing such a drive-letter linked file, the link will not work, since the C:// drive (or A:// drive) you've linked to isn't on the server.
All sorts of SEO problems on the web are caused by the use of relative URLs in links, canonicals and more. We find issues with them in our website reviews on a regular basis, but as you can see bigger sites like Twitter also have massive issues because of them.
To link pages using relative URL in HTML, use the <a> tag with href attribute. Relative URL is used to add a link to a page on the website. For example, /contact, /about_team, etc.
How about the <base>
tag? You can place it in the head and it will inform the browser what to use as the base path for all relative URLs on the page:
<head>
<base href="http://somesite.com/">
</head>
You could add it to each page that you serve with DOMDocument
(Note this is for PHP5.4 because of the array dereferencing, but that's easy fixed for earlier versions):
if($contentType == 'text/html') {
$doc = DOMDocument::loadHTML($html);
$head = $doc->getElementsByTagName('head')[0];
if(count($head->getElementsByTagName('base')) == 0) {
$base = DOMDocument::createElement('base');
$base->setAttribute('href', $urlOfPageDir);
}
$head->appendChild($base);
echo $doc->saveHTML();
}
Take note that $urlOfPageDir must be the absolute URL of the directory in which the page resides. See this SO question for more on the base tag: Is it recommended to use the <base> html tag?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With