Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can XSLT processors be multi-threaded?

I'm fishing for approaches to a problem with XSLT processing.

Is it possible to use parallel processing to speed up an XSLT processor? Or are XSLT processors inherently serial?

My hunch is that XML can be partitioned into chunks which could be processed by different threads, but since I'm not really finding any documentation of such a feat, I'm getting skeptical. It possible to use StAX to concurrently chunk XML?

It seems that most XSLT processors are implemented in Java or C/C++, but I really don't have a target language. I just want to know if a multi-threaded XSLT processor is conceivable.

What are your thoughts?

like image 414
Ben Simmons Avatar asked Nov 11 '09 07:11

Ben Simmons


People also ask

Can one processor have multiple threads?

Yes you can do multithreading on a single processor system. In multi-processor system , multiple threads execute , simultaneously on different cores. Eg- If there are two threads and two cores , then each thread would run on individual core.

What type of input is accepted by XSLT processors?

The XSLT processor operates on two inputs: the XML document to transform, and the XSLT stylesheet that is used to apply transformations on the XML. Each of these two can actually be multiple inputs.

What are multi-threaded processors?

Multithreading is a CPU (central processing unit) feature that allows two or more instruction threads to execute independently while sharing the same process resources. A thread is a self-contained sequence of instructions that can execute in parallel with other threads that are part of the same root process.

How can you tell if an application is multi-threaded?

The Windows "Task Manager" only shows processes. (Otherwise you would see a lot of duplicates for everything, since almost all Windows apps are multi-threaded, including Chrome.) To see threads of a process, use Process Explorer or Process Hacker; both of them have a "Threads" tab in the process properties dialog.


2 Answers

Saxon: Anatomy of an XSLT Processor, excellent article about XSLT processors, saxon in particular. It covers multithreading.

Saxon by the way is available both for .NET and Java and is one of the best processors available.

like image 82
Peter Lindqvist Avatar answered Sep 30 '22 01:09

Peter Lindqvist


Like most programming languages looping is inherently parallelizable as long as you follow a couple rules, this is known as Data Parallelism

  • No mutation of shared state in the loop
  • One iteration of the loop cannot depend on the outcome of another iteration

Any looping constructs could be parallelized in XSLT fairly easily.

With similar rules against mutation and dependencies you really could parallelize most of an XSLT transformation in a kind of a task based parallelism.

First, fragment the document whole into tasks, segmented at XSLT command and text node boundaries; each task should be assigned a sequential index according to it's position in the document (top to bottom).

Next, scatter the tasks to distinct XSLT processing functions each running on different threads; these processors will all need to be initialized with the same global state (variables, constants, etc...).

Finally, once all the transformations are complete, the controlling thread should gather the results (transformed strings) in index order and assemble them into the finished document.

like image 33
joshperry Avatar answered Sep 30 '22 00:09

joshperry