I faced the following problem: When URLConnection
is used via proxy the content length is always set to -1
.
First I checked that proxy really returns the Content-Length
(lynx
and wget
are also working via proxy; there is no other way to go to internet from local network):
$ lynx -source -head ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
HTTP/1.1 200 OK
Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
Content-Type: application/x-zip-compressed
Content-Length: 30745
Connection: close
Date: Thu, 02 Feb 2012 17:18:52 GMT
$ wget -S -X HEAD ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
--2012-04-03 19:36:54-- ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
Resolving proxy... 10.10.0.12
Connecting to proxy|10.10.0.12|:8080... connected.
Proxy request sent, awaiting response...
HTTP/1.1 200 OK
Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
Content-Type: application/x-zip-compressed
Content-Length: 30745
Connection: close
Age: 0
Date: Tue, 03 Apr 2012 17:36:54 GMT
Length: 30745 (30K) [application/x-zip-compressed]
Saving to: `WO2003-104476-001.zip'
In Java I wrote:
URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
int length = url.openConnection().getContentLength();
logger.debug("Got length: " + length);
and I get -1
. I started to debug FtpURLConnection
and it turned out that the necessary information is in underlying HttpURLConnection.responses
field however it is never properly populated from there:
(there is Content-Length: 30745
in headers). The content length is not updated when you start reading the stream or even after the stream was read. Code:
URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
URLConnection connection = url.openConnection();
logger.debug("Got length (1): " + connection.getContentLength());
InputStream input = connection.getInputStream();
byte[] buffer = new byte[4096];
int count = 0, len;
while ((len = input.read(buffer)) > 0) {
count += len;
}
logger.debug("Got length (2): " + connection.getContentLength() + " but wanted " + count);
Output:
Got length (1): -1
Got length (2): -1 but wanted 30745
It seems like it is a bug in JDK6, so I have opened new bug#7168608.
file:/
URLs I would appreciate.Remember that proxies will often change the representation of the underlying entity. In your case I suspect the proxy is probably altering the transfer encoding. Which in turn makes the Content-Length meaningless even if supplied.
You are falling afoul of the following two sections of the HTTP 1.1 spec:
4.4 Message Length
- ...
- ...
- If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored.
14.41 Transfer-Encoding
The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity.
Transfer-Encoding = "Transfer-Encoding" ":" 1#transfer-coding
Transfer-codings are defined in section 3.6. An example is:
Transfer-Encoding: chunked
If multiple encodings have been applied to an entity, the transfer- codings MUST be listed in the order in which they were applied. Additional information about the encoding parameters MAY be provided by other entity-header fields not defined by this specification.
Many older HTTP/1.0 applications do not understand the Transfer- Encoding header.
So The URLConnection is then ignoring the Content-Length
header, as per the spec because it is meaningless in the presence of chunked transfers
In your debugger screenshot it's not clear whether the Transfer-Encoding
header is present. Please let us know...
On further investigation - it seems that lynx does not show all the headers returned when you issue a lynx -head
. It is not showing the Transfer-Encoding
header critical to this discussion.
Here's the proof of the discrepancy with a publically visible website
Ξ▶ lynx -useragent='dummy' -source -head http://www.bbc.co.uk
HTTP/1.1 302 Found
Server: Apache
X-Cache-Action: PASS (non-cacheable)
X-Cache-Age: 0
Content-Type: text/html; charset=iso-8859-1
Date: Tue, 03 Apr 2012 13:33:06 GMT
Location: http://www.bbc.co.uk/mobile/
Connection: close
Ξ▶ wget -useragent='dummy' -S -X HEAD http://www.bbc.co.uk
--2012-04-03 14:33:22-- http://www.bbc.co.uk/
Resolving www.bbc.co.uk... 212.58.244.70
Connecting to www.bbc.co.uk|212.58.244.70|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: Apache
Cache-Control: private, max-age=15
Etag: "7e0f292b2e5e4c33cac1bc033779813b"
Content-Type: text/html
Transfer-Encoding: chunked
Date: Tue, 03 Apr 2012 13:33:22 GMT
Connection: keep-alive
X-Cache-Action: MISS
X-Cache-Age: 0
X-LB-NoCache: true
Vary: Cookie
Since I am obviously not inside your network I can't replicate your exact circumstances, but please validate that you really aren't getting a Transfer-Encoding header when passing through a proxy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With