My site has been created with an XML as a data store, and XSLT used as a template. It appears that Google is not very good on indexing sites that are XML/XSLT based. Are there any efficient/easy to implement software components that can render the XSLT just for the Google bot indexer? It would be even better if they worked with PHP.
Take a look at the PHP XSLT processor.
http://php.net/manual/en/class.xsltprocessor.php
Use as follows:
<?php
$sXml = "<xml>";
$sXml .= "<sudhir>hello sudhir</sudhir>";
$sXml .= "</xml>";
# LOAD XML FILE
$XML = new DOMDocument();
$XML->loadXML( $sXml );
# START XSLT
$xslt = new XSLTProcessor();
$XSL = new DOMDocument();
$XSL->load( 'xsl/index.xsl', LIBXML_NOCDATA);
$xslt->importStylesheet( $XSL );
#PRINT
print $xslt->transformToXML( $XML );
?>
(From http://php.net/manual/en/book.xsl.php)
UPDATE
You asked in the comment how to intercept a request from a specific user agent (eg. the Googlebot). There are various ways to do this, depending on the web server technology you are using.
On Apache, one method would be to use mod_rewrite to internally divert the processing of the request to a PHP script containing code similar to what we see above. This script retrieves the XML from the originally requested URL and renders the transformation to the client. The rewrite rule would have a Rewrite Condition that compares the HTTP_USER_AGENT header to Google's. Here is an example of the rule (untested, but you should get the idea):
RewriteCond %{HTTP_USER_AGENT} ^(.*)Googlebot(.*)$ [NC]
RewriteRule ^(.*\.xml.*)$ /renderxslt.php?url=$1 [L]
Briefly, the condition is looking for a referrer starting with the string "googlebot" and the rewrite rule is matching any URL with the string ".xml" in it, and passing the full URL to the renderxslt.php page as a querystring parameter.
A port of mod_rewrite exis for IIS too (http://www.isapirewrite.com/).
Alternatively, with IIS you could use an ASP.NET HTTP module to intercept the request, again checking Request.Headers["HTTP_USER_AGENT"]
for Google's signature. You can then proceed in a similar manner to above by reading the HTML generated by your PHP script, or altenatively by using the ASP.NET XML control:
<asp:Xml ID="Xml1" runat="server" DocumentSource="~/cdlist.xml" TransformSource="~/listformat.xsl"></asp:Xml>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With