Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google's Indexing XSLT Pages

Tags:

php

xml

seo

xslt

My site has been created with an XML as a data store, and XSLT used as a template. It appears that Google is not very good on indexing sites that are XML/XSLT based. Are there any efficient/easy to implement software components that can render the XSLT just for the Google bot indexer? It would be even better if they worked with PHP.

like image 428
monksy Avatar asked Feb 23 '11 04:02

monksy


1 Answers

Take a look at the PHP XSLT processor.

http://php.net/manual/en/class.xsltprocessor.php

Use as follows:

<?php 
$sXml  = "<xml>"; 
$sXml .= "<sudhir>hello sudhir</sudhir>"; 
$sXml .= "</xml>"; 

# LOAD XML FILE 
$XML = new DOMDocument(); 
$XML->loadXML( $sXml ); 

# START XSLT 
$xslt = new XSLTProcessor(); 
$XSL = new DOMDocument(); 
$XSL->load( 'xsl/index.xsl', LIBXML_NOCDATA); 
$xslt->importStylesheet( $XSL ); 
#PRINT 
print $xslt->transformToXML( $XML ); 
?>

(From http://php.net/manual/en/book.xsl.php)

UPDATE

You asked in the comment how to intercept a request from a specific user agent (eg. the Googlebot). There are various ways to do this, depending on the web server technology you are using.

On Apache, one method would be to use mod_rewrite to internally divert the processing of the request to a PHP script containing code similar to what we see above. This script retrieves the XML from the originally requested URL and renders the transformation to the client. The rewrite rule would have a Rewrite Condition that compares the HTTP_USER_AGENT header to Google's. Here is an example of the rule (untested, but you should get the idea):

RewriteCond %{HTTP_USER_AGENT} ^(.*)Googlebot(.*)$ [NC]
RewriteRule ^(.*\.xml.*)$ /renderxslt.php?url=$1 [L]

Briefly, the condition is looking for a referrer starting with the string "googlebot" and the rewrite rule is matching any URL with the string ".xml" in it, and passing the full URL to the renderxslt.php page as a querystring parameter.

A port of mod_rewrite exis for IIS too (http://www.isapirewrite.com/).

Alternatively, with IIS you could use an ASP.NET HTTP module to intercept the request, again checking Request.Headers["HTTP_USER_AGENT"] for Google's signature. You can then proceed in a similar manner to above by reading the HTML generated by your PHP script, or altenatively by using the ASP.NET XML control:

<asp:Xml ID="Xml1" runat="server" DocumentSource="~/cdlist.xml" TransformSource="~/listformat.xsl"></asp:Xml>
like image 176
Mike Chamberlain Avatar answered Oct 26 '22 22:10

Mike Chamberlain