I am retrieving a page from another host, and then initializing the form with data from a database before sending it on to the user.
I need to make the URLs in href
and src
attributes absolute, so that the browsers load them from the right place.
Can I set an HTTP header to cause this to happen without modifying the HTML?
To find the base URL of your website, go to the site's front page. What you see in the address bar on your site's front page is the base URL of your website.
The <base> tag specifies the base URL and/or target for all relative URLs in a document. The <base> tag must have either an href or a target attribute present, or both. There can only be one single <base> element in a document, and it must be inside the <head> element.
The HTTP Link Header. The Link: header in HTTP allows the server to point an interested client to another resource containing metadata about the requested resource. For example, Link: <meta.rdf>; rel=meta.
As an alternative, you can use HTML's <base>
tag instead, which has an href
attribute for this exact purpose:
This attribute specifies an absolute URI that acts as the base URI for resolving relative URIs.
Per HTML and URLs on W3C:
User agents should calculate the base URL for resolving relative URLs according to the [RFC1808]. The following is a summary of how [RFC1808] applies to HTML. User agents should calculate the base URL according to the following precedences (highest priority to lowest):
- The base URL is set by the
BASE
element.- The base URL is given by an HTTP header (see [RFC2068]).
- By default, the base URL is that of the current document.
Additionally, the
OBJECT
andAPPLET
elements define attributes that take precedence over the value set by theBASE
element. Please consult the definitions of these elements for more information about URL issues specific to them.
RFC 2068 is the original spec for HTTP 1.1. It defined Content-Base
and Content-Location
headers for the purpose of specifying an entity's base URL used for resolving relative URLs within the entity:
14.11 Content-Base The Content-Base entity-header field may be used to specify the base URI for resolving relative URLs within the entity. This header field is described as Base in RFC 1808, which is expected to be revised. Content-Base = "Content-Base" ":" absoluteURI If no Content-Base field is present, the base URI of an entity is defined either by its Content-Location (if that Content-Location URI is an absolute URI) or the URI used to initiate the request, in that order of precedence. Note, however, that the base URI of the contents within the entity-body may be redefined within that entity-body.
14.15 Content-Location The Content-Location entity-header field may be used to supply the resource location for the entity enclosed in the message. In the case where a resource has multiple entities associated with it, and those entities actually have separate locations by which they might be individually accessed, the server should provide a Content-Location for the particular variant which is returned. In addition, a server SHOULD provide a Content-Location for the resource corresponding to the response entity. Content-Location = "Content-Location" ":" ( absoluteURI | relativeURI ) If no Content-Base header field is present, the value of Content- Location also defines the base URL for the entity (see section 14.11). The Content-Location value is not a replacement for the original requested URI; it is only a statement of the location of the resource corresponding to this particular entity at the time of the request. Future requests MAY use the Content-Location URI if the desire is to identify the source of that particular entity. A cache cannot assume that an entity with a Content-Location different from the URI used to retrieve it can be used to respond to later requests on that Content-Location URI. However, the Content- Location can be used to differentiate between multiple entities retrieved from a single requested resource, as described in section 13.6. If the Content-Location is a relative URI, the URI is interpreted relative to any Content-Base URI provided in the response. If no Content-Base is provided, the relative URI is interpreted relative to the Request-URI.
RFC 2068 is obsolete, replaced by RFC 2616, which is currently the most common HTTP 1.1 spec implemented by most web servers. It deletes the Content-Base
header completely from the HTTP 1.1 spec, and slightly re-defines the semantics of Content-Location
:
14.14 Content-Location The Content-Location entity-header field MAY be used to supply the resource location for the entity enclosed in the message when that entity is accessible from a location separate from the requested resource's URI. A server SHOULD provide a Content-Location for the variant corresponding to the response entity; especially in the case where a resource has multiple entities associated with it, and those entities actually have separate locations by which they might be individually accessed, the server SHOULD provide a Content-Location for the particular variant which is returned. Content-Location = "Content-Location" ":" ( absoluteURI | relativeURI ) The value of Content-Location also defines the base URI for the entity. The Content-Location value is not a replacement for the original requested URI; it is only a statement of the location of the resource corresponding to this particular entity at the time of the request. Future requests MAY specify the Content-Location URI as the request- URI if the desire is to identify the source of that particular entity. A cache cannot assume that an entity with a Content-Location different from the URI used to retrieve it can be used to respond to later requests on that Content-Location URI. However, the Content- Location can be used to differentiate between multiple entities retrieved from a single requested resource, as described in section 13.6. If the Content-Location is a relative URI, the relative URI is interpreted relative to the Request-URI. The meaning of the Content-Location header in PUT or POST requests is undefined; servers are free to ignore it in those cases.
It is important to note that "The value of Content-Location also defines the base URI for the entity" still applies at this point.
Moving forward, RFC 2616 has been obsoleted by RFCs 7230-7235 (which are not widely implemented yet). In particular, RFC 7231 completely redefines the semantics of Content-Location
:
3.1.4.2. Content-Location The "Content-Location" header field references a URI that can be used as an identifier for a specific resource corresponding to the representation in this message's payload. In other words, if one were to perform a GET request on this URI at the time of this message's generation, then a 200 (OK) response would contain the same representation that is enclosed as payload in this message. Content-Location = absolute-URI / partial-URI The Content-Location value is not a replacement for the effective Request URI (Section 5.5 of [RFC7230]). It is representation metadata. It has the same syntax and semantics as the header field of the same name defined for MIME body parts in Section 4 of [RFC2557]. However, its appearance in an HTTP message has some special implications for HTTP recipients. If Content-Location is included in a 2xx (Successful) response message and its value refers (after conversion to absolute form) to a URI that is the same as the effective request URI, then the recipient MAY consider the payload to be a current representation of that resource at the time indicated by the message origination date. For a GET (Section 4.3.1) or HEAD (Section 4.3.2) request, this is the same as the default semantics when no Content-Location is provided by the server. For a state-changing request like PUT (Section 4.3.4) or POST (Section 4.3.3), it implies that the server's response contains the new representation of that resource, thereby distinguishing it from representations that might only report about the action (e.g., "It worked!"). This allows authoring applications to update their local copies without the need for a subsequent GET request. If Content-Location is included in a 2xx (Successful) response message and its field-value refers to a URI that differs from the effective request URI, then the origin server claims that the URI is an identifier for a different resource corresponding to the enclosed representation. Such a claim can only be trusted if both identifiers share the same resource owner, which cannot be programmatically determined via HTTP. o For a response to a GET or HEAD request, this is an indication that the effective request URI refers to a resource that is subject to content negotiation and the Content-Location field-value is a more specific identifier for the selected representation. o For a 201 (Created) response to a state-changing method, a Content-Location field-value that is identical to the Location field-value indicates that this payload is a current representation of the newly created resource. o Otherwise, such a Content-Location indicates that this payload is a representation reporting on the requested action's status and that the same report is available (for future access with GET) at the given URI. For example, a purchase transaction made via a POST request might include a receipt document as the payload of the 200 (OK) response; the Content-Location field-value provides an identifier for retrieving a copy of that same receipt in the future. A user agent that sends Content-Location in a request message is stating that its value refers to where the user agent originally obtained the content of the enclosed representation (prior to any modifications made by that user agent). In other words, the user agent is providing a back link to the source of the original representation. An origin server that receives a Content-Location field in a request message MUST treat the information as transitory request context rather than as metadata to be saved verbatim as part of the representation. An origin server MAY use that context to guide in processing the request or to save it for other uses, such as within source links or versioning metadata. However, an origin server MUST NOT use such context information to alter the request semantics. For example, if a client makes a PUT request on a negotiated resource and the origin server accepts that PUT (without redirection), then the new state of that resource is expected to be consistent with the one representation supplied in that PUT; the Content-Location cannot be used as a form of reverse content selection identifier to update only one of the negotiated representations. If the user agent had wanted the latter semantics, it would have applied the PUT directly to the Content-Location URI.
Most importantly, RFC 7231 also states:
Appendix B. Changes from RFC 2616 ... The definition of Content-Location has been changed to no longer affect the base URI for resolving relative URI references, due to poor implementation support and the undesirable effect of potentially breaking relative links in content-negotiated resources. (Section 3.1.4.2) ...
So, in answer to the question that was asked:
as of RFC 2616, the answer is YES, Content-Location
exists to specify an entity's base URL at the HTTP level.
as of RFC 7231, the answer is NO, Content-Location
can no longer be used to specify an entity's base URL.
AFAIK, as of RFC 7231, no new or existing HTTP header has been defined to restore the base URL behavior. So there is no longer an HTTP header available for specifying a base URL. It can only be specified by the entity itself, if it needs to be different than the entity's request URL.
There is no such for HTTP. But you can set the base URL with HTML’s BASE
element like:
<base href="http://example.com/">
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With