I'm stumped on this one. Try loading this image in your browser, and then save it to your hard disk.
http://profile.ak.fbcdn.net/hprofile-ak-snc4/41674_660962816_995_n.jpg
It's a valid JPEG file at 11377 bytes.
Now try to download it with wget
or curl
. Only 11252 bytes show up, and the bottom right part of the image is missing.
What gives?
Here goes…
Taking a packet dump, I see that Facebook returns the same Content-Length to Safari as it does to curl, and that content-length is the incorrect 11252:
GET /hprofile-ak-snc4/41674_660962816_995_n.jpg HTTP/1.1 User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3 Host: profile.ak.fbcdn.net Accept: */* HTTP/1.1 200 OK Content-Type: image/jpeg ... snip .... Content-Length: 11252
And with Safari:
GET /hprofile-ak-snc4/41674_660962816_995_n.jpg HTTP/1.1 Host: profile.ak.fbcdn.net User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-us) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27 ... snip ... HTTP/1.1 200 OK Content-Type: image/jpeg ... snip ... Content-Length: 11252
So I'm going to guess Facebook is sending an incorrect Content-Length. To test this, I'll use netcat:
$ cat headers GET /hprofile-ak-snc4/41674_660962816_995_n.jpg HTTP/1.0 Host: profile.ak.fbcdn.net Accept: */* EOF $ nc -vvv profile.ak.fbcdn.net 80 output Warning: Inverse name lookup failed for `142.231.1.174' Notice: Real hostname for profile.ak.fbcdn.net [142.231.1.165] is a142-231-1-165.deploy.akamaitechnologies.com profile.ak.fbcdn.net [142.231.1.174] 80 (http) open Total received bytes: 12k (11639) Total sent bytes: 97 $ head output HTTP/1.0 200 OK Content-Type: image/jpeg ... snip ... Content-Length: 11252
(note that I used HTTP/1.0 so the Facebook servers wouldn't try to hold the connection open)
Removing the first 10 lines of ouput
using a text editor then saving it as output.jpg
, I've got the complete image.
So this confirm that Facebook is sending an incorrect Content-Length
header (and the image is getting cut off because curl is paying attention to the content length while netcat isn't).
Digging a little further, it seems like Aleski is correct — the Content-Length
is correct when the image is sent gzip-compressed. To confirm this, I added Accept-Encoding: gzip
to my headers
file. Facebook correctly sends back a gzip'd response which is the expected length, and uncompressing it results in the correct image.
tl;dr: Facebook's Content-Length
is incorrect if the Content-Encoding
is not gzip
.
It seems the server is faulty. When I tested it, the difference between firefox and wget was that firefox indicated that it accepts gzip or deflate -compressed answers to it's request, whereas wget did not.
The servers response to firefox was 11252 bytes of compressed data, and it's response to wget was 11377 bytes of uncompressed data. The Content-Length it sent was however 11252 to both (as David already said).
In other words, it seems that the server is caching the compressed version and incorrectly sending the compressed size even when sending the data uncompressed. You get all the data, but since the server advertises less data, wget (and other software that asks for uncompressed data) discards the "extra" data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With