I like to understand how to use a <base href="" />
value for my web crawler, so I tested several combinations with major browsers and finally found something with double slashes I don't understand.
If you don't like to read everything jump to the test results of D and E. Demonstration of all tests:
http://gutt.it/basehref.php
Step by step my test results on calling http://example.com/images.html
:
A - Multiple base href
<html>
<head>
<base target="_blank" />
<base href="http://example.com/images/" />
<base href="http://example.com/" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg">
<img src="./image.jpg">
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>
Conclusion
<base>
with href
counts/
targets the root../
goes one folder up B - Without trailing slash
<html>
<head>
<base href="http://example.com/images" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg">
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>
Conclusion
<base href>
ignores everything after the last slash so http://example.com/images
becomes http://example.com/
C - How it should be
<html>
<head>
<base href="http://example.com/" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg">
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>
Conclusion
D - Double Slash
<html>
<head>
<base href="http://example.com/images//" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg">
<img src="./image.jpg">
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg">
</body>
</html>
E - Double Slash with whitespace
<html>
<head>
<base href="http://example.com/images/ /" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg">
</body>
</html>
Both are not "valid" URLs, but real results of my web crawler. Please explain what happend in D and E that ../image.jpg
could be found and why causes the whitespace a difference?
Only for your interest:
<base href="http://example.com//" />
is the same as Test C
<base href="http://example.com/ /" />
is completely different. Only ../image.jpg
is found<base href="a/" />
finds only /images/image.jpg
A double slash in the URL path is valid and will respond in the browser, but is typically unwelcome, as this could cause duplicate content issues if the CMS delivers the same content on two URLs (i.e. single slash and double slash).
If the double slash in the page's permalink is generated by your CMS, you might need to address your developer for help. If the URL with a double slash is indexed in Google or has incoming external links, you can set the proper 301 redirects to the corrected URL.
The "two forward slashes" are a common shorthand for "request the referenced resource using whatever protocol is being used to load the current page".
Particularly as a double slash in written work usually means "new line here". Follow this answer to receive notifications.
The behavior of base
is explained in the HTML spec:
The
base
element allows authors to specify the document base URL for the purposes of resolving relative URLs.
As shown in your test A, if there are multiple base
with href
, the document base URL will be the first one.
Resolving relative URLs is done this way:
Apply the URL parser to url, with base as the base URL, with encoding as the encoding.
The URL parsing algorithm is defined in the URL spec.
It's too complex to be explained here in detail. But basically, this is what happens:
/
is calculated with respect to base URL's host./
, the last part will be a file, not a directory../
is the current directory../
goes one directory up(Probably, "directory" and "file" are not the proper terminology in URLs)
Some examples:
http://example.com/images/a/./
is http://example.com/images/a/
http://example.com/images/a/../
is http://example.com/images/
http://example.com/images//./
is http://example.com/images//
http://example.com/images//../
is http://example.com/images/
http://example.com/images/./
is http://example.com/images/
http://example.com/images/../
is http://example.com/
Note that, in most cases, //
will be like /
. As said by @poncha,
Unless you're using some kind of URL rewriting (in which case the rewriting rules may be affected by the number of slashes), the uri maps to a path on disk, but in (most?) modern operating systems (Linux/Unix, Windows), multiple path separators in a row do not have any special meaning, so /path/to/foo and /path//to////foo would eventually map to the same file.
However, in general / /
won't become //
.
You can use the following snippet to resolve your list of relative URLs to absolute ones:
var bases = [
"http://example.com/images/",
"http://example.com/images",
"http://example.com/",
"http://example.com/images//",
"http://example.com/images/ /"
];
var urls = [
"/images/image.jpg",
"image.jpg",
"./image.jpg",
"images/image.jpg",
"/image.jpg",
"../image.jpg"
];
function newEl(type, contents) {
var el = document.createElement(type);
if(!contents) return el;
if(!(contents instanceof Array))
contents = [contents];
for(var i=0; i<contents.length; ++i)
if(typeof contents[i] == 'string')
el.appendChild(document.createTextNode(contents[i]))
else if(typeof contents[i] == 'object') // contents[i] instanceof Node
el.appendChild(contents[i])
return el;
}
function emoticon(str) {
return {
'http://example.com/images/image.jpg': 'good',
'http://example.com/images//image.jpg': 'neutral'
}[str] || 'bad';
}
var base = document.createElement('base'),
a = document.createElement('a'),
output = document.createElement('ul'),
head = document.getElementsByTagName('head')[0];
head.insertBefore(base, head.firstChild);
for(var i=0; i<bases.length; ++i) {
base.href = bases[i];
var test = newEl('li', [
'Test ' + (i+1) + ': ',
newEl('span', bases[i])
]);
test.className = 'test';
var testItems = newEl('ul');
testItems.className = 'test-items';
for(var j=0; j<urls.length; ++j) {
a.href = urls[j];
var absURL = a.cloneNode(false).href;
/* Stupid old IE requires cloning
https://stackoverflow.com/a/24437713/1529630 */
var testItem = newEl('li', [
newEl('span', urls[j]),
' → ',
newEl('span', absURL)
]);
testItem.className = 'test-item ' + emoticon(absURL);
testItems.appendChild(testItem);
}
test.appendChild(testItems);
output.appendChild(test);
}
document.body.appendChild(output);
span {
background: #eef;
}
.test-items {
display: table;
border-spacing: .13em;
padding-left: 1.1em;
margin-bottom: .3em;
}
.test-item {
display: table-row;
position: relative;
list-style: none;
}
.test-item > span {
display: table-cell;
}
.test-item:before {
display: inline-block;
width: 1.1em;
height: 1.1em;
line-height: 1em;
text-align: center;
border-radius: 50%;
margin-right: .4em;
position: absolute;
left: -1.1em;
top: 0;
}
.good:before {
content: ':)';
background: #0f0;
}
.neutral:before {
content: ':|';
background: #ff0;
}
.bad:before {
content: ':(';
background: #f00;
}
You can also play with this snippet:
var resolveURL = (function() {
var base = document.createElement('base'),
a = document.createElement('a'),
head = document.getElementsByTagName('head')[0];
return function(url, baseurl) {
if(base) {
base.href = baseurl;
head.insertBefore(base, head.firstChild);
}
a.href = url;
var abs = a.cloneNode(false).href;
/* Stupid old IE requires cloning
https://stackoverflow.com/a/24437713/1529630 */
if(base)
head.removeChild(base);
return abs;
};
})();
var base = document.getElementById('base'),
url = document.getElementById('url'),
abs = document.getElementById('absolute');
base.onpropertychange = url.onpropertychange = function() {
if (event.propertyName == "value")
update()
};
(base.oninput = url.oninput = update)();
function update() {
abs.value = resolveURL(url.value, base.value);
}
label {
display: block;
margin: 1em 0;
}
input {
width: 100%;
}
<label>
Base url:
<input id="base" value="http://example.com/images//foo////bar/baz"
placeholder="Enter your base url here" />
</label>
<label>
URL to be resolved:
<input id="url" value="./a/b/../c"
placeholder="Enter your URL here">
</label>
<label>
Resulting url:
<input id="absolute" readonly>
</label>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With