Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I correct my htaccess for proxying search engine crawl requests?

I have built a website with React at the front end and WordPress as the backend. For search engine crawlers to see my site, I have set up prerendering at the server side, and am trying to set up htaccess to proxy requests coming from search engines so that they are served pre-rendered pages.

For testing, I am using the "Fetch as Google" tool in Google Webmasters.

Here is my attempt:

<IfModule mod_rewrite.c>
    RewriteEngine On
    <IfModule mod_proxy_http.c>
    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    # Proxy the request ... works for inner pages only
    RewriteRule ^(?!.*?)$ http://example.com:3000/https://example.com/$1 [P,L]

    </IfModule>
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
   RewriteEngine On
   RewriteBase /
   RewriteRule ^index\.php$ - [L]
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteRule . /index.php [L]
</IfModule>
# END WordPress

My problem is that this directive doesn't work for my home page, and works only for inner pages (http://example.com/inner-page/):

RewriteRule ^(?!.*?)$ http://example.com:3000/https://example.com/$1 [P,L]

When I change this line to the following line, the home page request is indeed proxied correctly, but the inner pages stop working.

RewriteRule ^(index\.php)?(.*) http://example.com:3000/https://example.com/$1 [P,L]

Could you help me fix the rewrite rule so that my home page is also proxied correctly for the googlebot?

like image 677
Naweed Chougle Avatar asked Jun 07 '17 17:06

Naweed Chougle


2 Answers

Change the RewriteRule to:

RewriteRule ^(.*)/?$ http://example.com:3000/https://example.com/$1 [P,L]
like image 82
Croises Avatar answered Nov 11 '22 00:11

Croises


First avoid the repetetions

<IfModule mod_rewrite.c>
    RewriteEngine On
    <IfModule mod_proxy_http.c>
    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    # Proxy the request ... works for inner pages only
    RewriteRule ^(?!.*?)$ http://example.com:3000/https://example.com/$1 [P,L]
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    </IfModule>
</IfModule>

Then change ^(?!.*?)$ to ^.*$ or with a good pattern like [a-zA-Z0-9-.]*. Don't forget to use 0 or more flag (*) there.

The correct code will be

<IfModule mod_rewrite.c>
    RewriteEngine On
    <IfModule mod_proxy_http.c>
    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    # Proxy the request ... works for inner pages only
    RewriteRule ^(.*)$ http://example.com:3000/https://example.com/$1 [P,L]
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    </IfModule>
</IfModule>
like image 37
Sagar V Avatar answered Nov 11 '22 01:11

Sagar V