Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT - suitable for the task?

I have a requirement of transforming a huge XML document into multiple HTML documents. The XML is as follows:

<society>
  <party_members>
    <member id="1" first_name="" last_name="O'Brien">
      <ministry_id>1</ministry_id>
      <ministry_id>3</ministry_id>
    </member>
    <member id="2" first_name="Julia" last_name="">
      <ministry_id>2</ministry_id>
    </member>
    <member id="3" first_name="Winston" last_name="Smith">
      <ministry_id>1</ministry_id>
    </member>
  </party_members>
  <ministries>
    <ministry>
      <id>1</id>
      <short_title>Minitrue</short_title>
      <long_title>Ministry of truth</long_title>
      <concerns>News, entertainment,education and arts </concerns>      
    </ministry>
    <ministry>
      <id>2</id>
      <short_title>Minipax</short_title>
      <long_title>Ministry of Peace</long_title>
      <concerns>War</concerns>
    </ministry>
    <ministry>
      <id>3</id>
      <short_title>Minilove</short_title>
      <long_title>Ministry of Love</long_title>
      <concerns>Dissidents</concerns>      
    </ministry>
  </ministries>
</society>

Where potential number of party members can be quite large - millions, and number of ministries is small, around 300-400. For each of the party member there should be an output HTML with following content:

<html>  
  <body>
    <h2>Party member: Winston Smith</h2>
    <h3>Works in:</h3>
    <div class="ministry">
      <h4>Ministry of truth</h4> - Minitrue
      <h5>Ministry of truth <i>concerns</i> itself with <i>News, entertainment,education and arts</i></h5>  
    </div>
  </body>
</html>

The number of output documents should == number of party members.

I'm now struggling with XSLT, but can't get it to work.

Please help me decide if XSLT is a good tool for this job, if it is, hint me as if how to implement it, what XSLT constructs should be used, etc.

Of course I could simply write mini transformation in a procedural language, but I'm looking for a 'apply transformation template' approach, rather than procedural parsing and modification to be able to hand the template to other users for further modifications (CSS, formatting etc).

I'm using ruby + nokogiri(which is a set of bindings to libxslt), but it is possible to use any language.

If XSTL is a bad fit for this task, what other instruments can be used here, provided I must transform ~1M of users in several minutes with small memory consumption?

Additional benefit would be to be able to parallelize the processing.

Thank you.

like image 962
Valentin V Avatar asked Mar 06 '26 20:03

Valentin V


1 Answers

Well with pure XSLT 1.0 you can't create multiple result documents with a single transformation which you seem to want to do. For that you need to use an XSLT 2.0 processor like Saxon 9 or AltovaXML with the XSLT 2.0 instruction [xsl:result-document][1] or you need to use an XSLT 1.0 processor like xsltproc/libxslt which implements http://www.exslt.org/exsl/elements/document/index.html. If you can use one of them then XSLT is well suited for your task.

[edit] With libxslt respectively xsltproc the following stylesheet code

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exsl="http://exslt.org/common"
  exclude-result-prefixes="exsl"
  extension-element-prefixes="exsl"
  version="1.0">

<xsl:output method="html" indent="yes"/>

<xsl:key name="ministry-by-id" match="ministry" use="id"/>

<xsl:template match="/">
  <xsl:apply-templates select="society/party_members/member" mode="doc"/>
</xsl:template>

<xsl:template match="member" mode="doc">
  <exsl:document href="member{@id}.xml">
    <html>
      <body>
        <h2>Party member: <xsl:value-of select="concat(@first_name, ' ', @last_name)"/></h2>
        <h3>Works in</h3>
        <xsl:apply-templates select="key('ministry-by-id', ministry_id)"/>
      </body>
    </html>
  </exsl:document>
</xsl:template>

<xsl:template match="ministry">
  <div class="ministry">
    <h4><xsl:value-of select="long_title"/></h4>
    <h5><xsl:value-of select="long_title"/> <i>concerns</i> itself with <i><xsl:value-of select="concerns"/></i></h5>
  </div>
</xsl:template>

</xsl:stylesheet>

shows how to use exsl:document to output several result documents with one transformation. It also uses a key to improve performance. Let us know whether that code works for your huge input data.

like image 176
Martin Honnen Avatar answered Mar 08 '26 10:03

Martin Honnen