Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Browser and wget load JPEG differently?

I'm stumped on this one. Try loading this image in your browser, and then save it to your hard disk.

http://profile.ak.fbcdn.net/hprofile-ak-snc4/41674_660962816_995_n.jpg

It's a valid JPEG file at 11377 bytes.

Now try to download it with wget or curl. Only 11252 bytes show up, and the bottom right part of the image is missing.

What gives?

like image 282
Leopd Avatar asked Mar 31 '11 23:03

Leopd


2 Answers

Here goes…

Taking a packet dump, I see that Facebook returns the same Content-Length to Safari as it does to curl, and that content-length is the incorrect 11252:

GET /hprofile-ak-snc4/41674_660962816_995_n.jpg HTTP/1.1
User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
Host: profile.ak.fbcdn.net
Accept: */*

HTTP/1.1 200 OK
Content-Type: image/jpeg
... snip ....
Content-Length: 11252

And with Safari:

GET /hprofile-ak-snc4/41674_660962816_995_n.jpg HTTP/1.1
Host: profile.ak.fbcdn.net
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-us) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27
... snip ...

HTTP/1.1 200 OK
Content-Type: image/jpeg
... snip ...
Content-Length: 11252

So I'm going to guess Facebook is sending an incorrect Content-Length. To test this, I'll use netcat:

$ cat  headers
GET /hprofile-ak-snc4/41674_660962816_995_n.jpg HTTP/1.0
Host: profile.ak.fbcdn.net
Accept: */*

EOF
$ nc -vvv profile.ak.fbcdn.net 80  output
Warning: Inverse name lookup failed for `142.231.1.174'
Notice: Real hostname for profile.ak.fbcdn.net [142.231.1.165] is a142-231-1-165.deploy.akamaitechnologies.com
profile.ak.fbcdn.net [142.231.1.174] 80 (http) open
Total received bytes: 12k (11639)
Total sent bytes: 97
$ head output
HTTP/1.0 200 OK
Content-Type: image/jpeg
... snip ...
Content-Length: 11252

(note that I used HTTP/1.0 so the Facebook servers wouldn't try to hold the connection open)

Removing the first 10 lines of ouput using a text editor then saving it as output.jpg, I've got the complete image.

So this confirm that Facebook is sending an incorrect Content-Length header (and the image is getting cut off because curl is paying attention to the content length while netcat isn't).

Digging a little further, it seems like Aleski is correct — the Content-Length is correct when the image is sent gzip-compressed. To confirm this, I added Accept-Encoding: gzip to my headers file. Facebook correctly sends back a gzip'd response which is the expected length, and uncompressing it results in the correct image.

tl;dr: Facebook's Content-Length is incorrect if the Content-Encoding is not gzip.

like image 64
David Wolever Avatar answered Sep 21 '22 10:09

David Wolever


It seems the server is faulty. When I tested it, the difference between firefox and wget was that firefox indicated that it accepts gzip or deflate -compressed answers to it's request, whereas wget did not.

The servers response to firefox was 11252 bytes of compressed data, and it's response to wget was 11377 bytes of uncompressed data. The Content-Length it sent was however 11252 to both (as David already said).

In other words, it seems that the server is caching the compressed version and incorrectly sending the compressed size even when sending the data uncompressed. You get all the data, but since the server advertises less data, wget (and other software that asks for uncompressed data) discards the "extra" data.

like image 30
Aleksi Torhamo Avatar answered Sep 21 '22 10:09

Aleksi Torhamo