Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Facebook not able to scrape my url

I have the HTML structure for my page as given below. I have added all the meta og tags, but still facebook is not able to scrape any info from my site.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"  xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
            <meta http-equiv="Content-Type" content="text/html;" charset=utf-8"></meta>
            <title>My Site</title>
            <meta content="This is my title" property="og:title">
            <meta content="This is my description" property="og:description">
            <meta content="http://ia.media-imdb.com/images/rock.jpg" property="og:image">
            <meta content="<MYPAGEID>" property="fb:page_id">
            .......
    </head>
    <body>
    .....

When I input the URL in facebook debugger(https://developers.facebook.com/tools/debug), I get the following messages:

Scrape Information
Response Code   404

Critical Errors That Must Be Fixed
Bad Response Code   URL returned a bad HTTP response code.


Errors that must be fixed

Missing Required Property   The 'og:url' property is required, but not present.
Missing Required Property   The 'og:type' property is required, but not present.
Missing Required Property   The 'og:title' property is required, but not present.


Open Graph Warnings That Should Be Fixed
Inferred Property   The 'og:url' property should be explicitly provided, even if a    value can be inferred from other tags.
Inferred Property   The 'og:title' property should be explicitly provided, even if a value can be inferred from other tags.

Why is facebook not reading the meta tags info? The page is accessible and not hidden behind login etc.

UPDATE

Ok I did bit of debugging and this is what I found. I have htaccess rule set in my directory- I am using PHP Codeigniter framework and have htaccess rule to remove index.php from the url.

So, when I feed the url to facebook debugger(https://developers.facebook.com/tools/debug) without index.php, facebook shows a 404, but when I feed url with index.php, it is able to parse my page.

Now how do I make facebook scrape content when the url doesn't have index.php?

This is my htaccess rule:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /

    #Removes access to the system folder by users.
    #Additionally this will allow you to create a System.php controller,
    #previously this would not have been possible.
    #'system' can be replaced if you have renamed your system folder.
    RewriteCond %{REQUEST_URI} ^system.*
    RewriteRule ^(.*)$ /index.php?/$1 [L]

    #When your application folder isn't in the system folder
    #This snippet prevents user access to the application folder
    #Submitted by: Fabdrol
    #Rename 'application' to your applications folder name.
    RewriteCond %{REQUEST_URI} ^application.*
    RewriteRule ^(.*)$ /index.php?/$1 [L]

    #Checks to see if the user is attempting to access a valid file,
    #such as an image or css document, if this isn't true it sends the
    #request to index.php
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?/$1 [L]
</IfModule>

<IfModule !mod_rewrite.c>
    # If we don't have mod_rewrite installed, all 404's
    # can be sent to index.php, and everything works as normal.
    # Submitted by: ElliotHaughin

    ErrorDocument 404 /index.php
</IfModule>
like image 728
Ninja Avatar asked Apr 10 '12 21:04

Ninja


2 Answers

The Facebook documentation includes details on the Open Graph Protocol and how to include the correct meta tags so that Facebook can scrape your URL accurately.

https://developers.facebook.com/docs/opengraphprotocol/

Essentially what you'll want to do is include some special og:tags instead (or in addition) to your existing meta tags.

  <head>
    <title>Ninja Site</title>
    <meta property="og:title" content="The Ninja"/>
    <meta property="og:type" content="movie"/>
    <meta property="og:url" content="http://www.nin.ja"/>
    <meta property="og:image" content="http://nin.ja/ninja.jpg"/>
    <meta property="og:site_name" content="Ninja"/>
    <meta property="fb:admins" content="USER_ID"/>
    <meta property="og:description"
          content="Superhuman or supernatural powers were often
                   associated with the ninja. Some legends include
                   flight, invisibility and shapeshifting..."/>
    ...
  </head>

If you have an .htaccess file redirecting things and making it difficult for Facebook to scrape your URL you might be able to get away with detecting Facebook's crawler with your .htaccess and feeding it the correct tags. I believe the the user agent that the Facebook crawler provides is this :

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

The documentation also has a section talking about making sure that their crawlers can access your site.

Depending on your configuration you can test this by looking at your servers access_log. On a UNIX system running apache, the access log is located at /var/log/httpd/access_log.

So you could use an entry similar to this in your .htaccess file -

RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit
RewriteRule ^(.*)$ ogtags.php?$1 [L,QSA]

The [L,QSA] flags that I placed there state that this is the L​ast rule that will be enforced on the current request (L) and the QSA (Query String Append) states that any query string given will be passed along when the URL is rewritten. For example, a URL such as :

https://example.com/?id=foo&action=bar

Will be passed to ogtags.php like this - ogtags.php?id=foo&action=bar. Your ogtags.php file will gave to generate dynamic og:meta tags according to the parameters that were passed.

Now whenever your .htaccess file detects the Facebook user agent, it will pass him the ogtags.php file (that can contain the correct og:meta information). Please be aware of any other rules you have in your .htaccess and how they might affect new rules.

From the .htaccess entries that you have detailed, I would recommend placing this new "Facebook rule" as the very first rule.

like image 172
Lix Avatar answered Nov 08 '22 08:11

Lix


I had the same problem, which was: Bad Response Code: URL returned a bad HTTP response code.

but oddly this is what solved it: I've added

    <meta property="og:locale" content="en_US" />

to my site HEAD tag and it worked.

Also, not to forget, in your application dashboard (where you get your APP ID) you must have atleast "Website with Facebook Login" enabled and enter the URL of the website. Otherwise it won't work...regardless if you are not using any Facebook Logins on your site.

like image 22
MistaPrime Avatar answered Nov 08 '22 06:11

MistaPrime