Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In-Facebook sharing of URLs doesn't pull in og: tag information until run through debugger (even though debugger gives no errors)

Here's an example URL:

http://www.motherjones.com/mojo/2012/05/reince-priebus-lgbt-workplace-discrimination

The above used to pull in no image, title or description when pasted into the Facebook status update box -- it remained a bare URL. I then ran it through the debugger, which found no problems. It now pulls in the headline, image and description when pasted into the status update box.

For comparison, here's a post I have not yet debugged. It does not transform when pasted into the update box. As soon as I or anyone else runs it through the debugger, however, it will start pulling in the headline (although this one doesn't have an image or description).

http://www.motherjones.com/kevin-drum/2012/05/health-insurers-required-credit-obama-when-sending-out-rebate-checks

This could simply be a timing issue -- FB is slow to prepare the metadata on our pages -- but we have noticed that it takes hours, maybe days for the sharing to start working properly. That's long after the piece has peaked in traffic, so it does us little good.

We started seeing this around April 9.

My question: is there something about our pages that is making Facebook slow to scrape them? What am I missing? If there is a problem, why doesn't the debugger tell me? It does seem like there's a slightly updated version of the doctype to try, but that doesn't seem likely to be the culprit. Also -- is there any reason I shouldn't write a hook to run everything through the debugger at publish time?

like image 379
Luke Smith Avatar asked May 14 '12 20:05

Luke Smith


1 Answers

Facebook caches the scrapped data on their side for faster response when users share. In the documentation of the Like Button it says:

When does Facebook scrape my page?

Facebook needs to scrape your page to know how to display it around the site.

Facebook scrapes your page every 24 hours to ensure the properties are up to date. The page is also scraped when an admin for the Open Graph page clicks the Like button and when the URL is entered into the Facebook URL Linter. Facebook observes cache headers on your URLs - it will look at "Expires" and "Cache-Control" in order of preference. However, even if you specify a longer time, Facebook will scrape your page every 24 hours.

The user agent of the scraper is: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"

As you can see, when you use the linter (aka debug tool) it clears the cache for the used url and replaces it with the new data, which is why you get different sharing results after you debugged the page. It doesn't sit right though with you saying that it sometimes takes days, but maybe their documentation is not completely accurate on that subject, after all they have a lot to scrap.

If the page is new, that is it wasn't scrapped before then there's no cache and you should get the right result when sharing, it's only when the og data was changed when you need to clear the cache. So if you update the data for a scrapped page be sure to debug it later, you can just issue an http request to the same url they use in the debug tool from the server side, you don't need to use the web interface.

If things still don't work as you expect, you can check the user agent string of incoming requests and compare it with facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) and if it matches log the response you send back, then compare it with the results you get when sharing, if it's inconsistent try to file a bug report. As for "hooking" a debugger request per publish, I would suggest against it, it seems like unnecessary traffic if things work as they should. I believe it's better to solve the problem then to use a work around.

like image 51
Nitzan Tomer Avatar answered Oct 24 '22 06:10

Nitzan Tomer