Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Facebook requests for {url}/no_facebook_preview_picture.jpg on 404 links

We operate a URL shortener, over the last week or so we've started seeing lots of weird requests for {normal url}/no_facebook_preview_picture.jpg from Facebook owned IPs and the user agent facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)

If I post a normal link to our site on my wall (set as Only Me so I can test) I get the following entry in our access log

66.220.152.6 - - [05/Feb/2013:16:31:36 +0000] "GET /44_U HTTP/1.1" 200 1314 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "-"

However if I post a link that returns 404 or 410 (spam link removed after creation) I get this

69.171.237.15 - - [05/Feb/2013:16:49:16 +0000] "GET /notexistURL HTTP/1.1" 404 1319 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "-"

then within an hour or so

173.252.110.113 - - [05/Feb/2013:17:15:15 +0000] "GET /notexistURL/no_facebook_preview_picture.jpg HTTP/1.1" 404 0 "-" "facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)" "-"

A WhoIs of that IP reports

NetName FACEBOOK-INC
NetHandle   NET-173-252-64-0-1

So they are definitely Facebook IPs.

We're getting about 10-20 requests like this a day, all identical. We can only get 7 days worth of log files back but these requests were happening 7 days ago.

I've tested links that are unique, so there is no other way for anything to find that link. I don't personally use Facebook that much, and all except my test links were created/posted by other users but I recognize all the applications linked to my Facebook account and there is nothing unusual so I don't think this is a 3rd party app (I can provide a list if needed but they're all big name apps)

During my examining of the log files, Facebook doesn't even seem to be creating these requests intelligently, it's just blindly sticking the string /no_facebook_preview_picture.jpg on the end of URLs, even with query strings. For example;

69.171.228.114 - - [05/Feb/2013:17:19:13 +0000] "GET /iAmNotARealURL1234777?ref=fb&cows_go=moo HTTP/1.1" 404 1118 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "-"
69.171.228.114 - - [05/Feb/2013:17:19:13 +0000] "GET /iamnotarealurl1234777 HTTP/1.1" 404 1118 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "-"
173.252.103.4 - - [05/Feb/2013:17:44:41 +0000] "GET /iAmNotARealURL1234777?ref=fb&cows_go=moo/no_facebook_preview_picture.jpg HTTP/1.1" 404 1118 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "-"

Google seems to display lots of random results, mostly from link originators but I couldn't find any information as to what these requests are.

What are these requests? What does Facebook need them for? Is this an error in our application or can these requests be safely ignored?

Update:

Some days we're now getting 2-3 hundred hits to these URLs

[sr@ns309372 nginx]$ for DAYLOG in `find ./ | grep "dftbashort.log-"`; do COUNT=`cat $DAYLOG | grep no_facebook_preview_picture | wc -l`; echo "${DAYLOG} has ${COUNT} occurences"; done
./dftbashort.log-20130201 has 0 occurences
./dftbashort.log-20130130 has 2 occurences
./dftbashort.log-20130129 has 2 occurences
./dftbashort.log-20130128 has 2 occurences
./dftbashort.log-20130202 has 378 occurences
./dftbashort.log-20130207 has 222 occurences
./dftbashort.log-20130205 has 257 occurences
./dftbashort.log-20130209 has 178 occurences
./dftbashort.log-20130131 has 2 occurences
./dftbashort.log-20130203 has 266 occurences
./dftbashort.log-20130206 has 667 occurences
./dftbashort.log-20130204 has 12 occurences
./dftbashort.log-20130127 has 4 occurences
./dftbashort.log-20130208 has 260 occurences

We don't provide any open-graph meta tags, and the page has no content other than a meta/javascript redirect.

like image 670
Smudge Avatar asked Feb 05 '13 18:02

Smudge


1 Answers

I'm pretty sure this is the share scraper trying to build a preview of your URL, run the URL through Facebook's Debug Tool and you'll see what Facebook sees / is looking for

I'm not sure what the /notexistURL/no_facebook_preview_picture.jpg requests are, assuming you don't have anything in your code pointing to such a URL; If i had to guess i'd say it was some sort of default or fallback used when there's no meta tags; possibly a bug - I'm fairly confident if you include the correct meta tags for Facebook it'll grab those and not make invalid requests, with the added benefit of the shares of your URLs looking better on Facebook.com and other sites that support the same tags

like image 127
Igy Avatar answered Oct 29 '22 15:10

Igy