Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URLConnection does not handle content length via proxy correctly

I faced the following problem: When URLConnection is used via proxy the content length is always set to -1.

First I checked that proxy really returns the Content-Length (lynx and wget are also working via proxy; there is no other way to go to internet from local network):

$ lynx -source -head ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
HTTP/1.1 200 OK
Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
Content-Type: application/x-zip-compressed
Content-Length: 30745
Connection: close
Date: Thu, 02 Feb 2012 17:18:52 GMT

$ wget -S -X HEAD ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
--2012-04-03 19:36:54--  ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
Resolving proxy... 10.10.0.12
Connecting to proxy|10.10.0.12|:8080... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
  Content-Type: application/x-zip-compressed
  Content-Length: 30745
  Connection: close
  Age: 0
  Date: Tue, 03 Apr 2012 17:36:54 GMT
Length: 30745 (30K) [application/x-zip-compressed]
Saving to: `WO2003-104476-001.zip'

In Java I wrote:

URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
int length = url.openConnection().getContentLength();
logger.debug("Got length: " + length);

and I get -1. I started to debug FtpURLConnection and it turned out that the necessary information is in underlying HttpURLConnection.responses field however it is never properly populated from there:

enter image description here (there is Content-Length: 30745 in headers). The content length is not updated when you start reading the stream or even after the stream was read. Code:

URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
URLConnection connection = url.openConnection();

logger.debug("Got length (1): " + connection.getContentLength());

InputStream input = connection.getInputStream();

byte[] buffer = new byte[4096];
int count = 0, len;
while ((len = input.read(buffer)) > 0) {
    count += len;
}

logger.debug("Got length (2): " + connection.getContentLength() + " but wanted " + count);

Output:

Got length (1): -1
Got length (2): -1 but wanted 30745

It seems like it is a bug in JDK6, so I have opened new bug#7168608.

  • If somebody can help me to write the code should return correct content length for direct FTP connection, FTP connection via proxy and local file:/ URLs I would appreciate.
  • If given problem cannot be worked-around with JDK6, suggest any other library that definitely works for all cases I've mentioned (Apache Http Client?).
like image 639
dma_k Avatar asked Oct 23 '22 06:10

dma_k


1 Answers

Remember that proxies will often change the representation of the underlying entity. In your case I suspect the proxy is probably altering the transfer encoding. Which in turn makes the Content-Length meaningless even if supplied.

You are falling afoul of the following two sections of the HTTP 1.1 spec:

4.4 Message Length

  1. ...
  2. ...
  3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored.

14.41 Transfer-Encoding

The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity.

Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding

Transfer-codings are defined in section 3.6. An example is:

Transfer-Encoding: chunked

If multiple encodings have been applied to an entity, the transfer- codings MUST be listed in the order in which they were applied. Additional information about the encoding parameters MAY be provided by other entity-header fields not defined by this specification.

Many older HTTP/1.0 applications do not understand the Transfer- Encoding header.

So The URLConnection is then ignoring the Content-Length header, as per the spec because it is meaningless in the presence of chunked transfers

In your debugger screenshot it's not clear whether the Transfer-Encoding header is present. Please let us know...

On further investigation - it seems that lynx does not show all the headers returned when you issue a lynx -head. It is not showing the Transfer-Encoding header critical to this discussion.

Here's the proof of the discrepancy with a publically visible website

Ξ▶ lynx -useragent='dummy' -source -head http://www.bbc.co.uk                                                                                                                  
HTTP/1.1 302 Found
Server: Apache
X-Cache-Action: PASS (non-cacheable)
X-Cache-Age: 0
Content-Type: text/html; charset=iso-8859-1
Date: Tue, 03 Apr 2012 13:33:06 GMT
Location: http://www.bbc.co.uk/mobile/
Connection: close

Ξ▶ wget -useragent='dummy' -S -X HEAD http://www.bbc.co.uk                                                                                                                 
--2012-04-03 14:33:22--  http://www.bbc.co.uk/
Resolving www.bbc.co.uk... 212.58.244.70
Connecting to www.bbc.co.uk|212.58.244.70|:80... connected.
HTTP request sent, awaiting response... 
HTTP/1.1 200 OK
Server: Apache
Cache-Control: private, max-age=15
Etag: "7e0f292b2e5e4c33cac1bc033779813b"
Content-Type: text/html
Transfer-Encoding: chunked
Date: Tue, 03 Apr 2012 13:33:22 GMT
Connection: keep-alive
X-Cache-Action: MISS
X-Cache-Age: 0
X-LB-NoCache: true
Vary: Cookie

Since I am obviously not inside your network I can't replicate your exact circumstances, but please validate that you really aren't getting a Transfer-Encoding header when passing through a proxy.

like image 148
sw1nn Avatar answered Nov 14 '22 23:11

sw1nn