I'm fishing for approaches to a problem with XSLT processing.
Is it possible to use parallel processing to speed up an XSLT processor? Or are XSLT processors inherently serial?
My hunch is that XML can be partitioned into chunks which could be processed by different threads, but since I'm not really finding any documentation of such a feat, I'm getting skeptical. It possible to use StAX to concurrently chunk XML?
It seems that most XSLT processors are implemented in Java or C/C++, but I really don't have a target language. I just want to know if a multi-threaded XSLT processor is conceivable.
What are your thoughts?
Yes you can do multithreading on a single processor system. In multi-processor system , multiple threads execute , simultaneously on different cores. Eg- If there are two threads and two cores , then each thread would run on individual core.
The XSLT processor operates on two inputs: the XML document to transform, and the XSLT stylesheet that is used to apply transformations on the XML. Each of these two can actually be multiple inputs.
Multithreading is a CPU (central processing unit) feature that allows two or more instruction threads to execute independently while sharing the same process resources. A thread is a self-contained sequence of instructions that can execute in parallel with other threads that are part of the same root process.
The Windows "Task Manager" only shows processes. (Otherwise you would see a lot of duplicates for everything, since almost all Windows apps are multi-threaded, including Chrome.) To see threads of a process, use Process Explorer or Process Hacker; both of them have a "Threads" tab in the process properties dialog.
Saxon: Anatomy of an XSLT Processor, excellent article about XSLT processors, saxon in particular. It covers multithreading.
Saxon by the way is available both for .NET and Java and is one of the best processors available.
Like most programming languages looping is inherently parallelizable as long as you follow a couple rules, this is known as Data Parallelism
Any looping constructs could be parallelized in XSLT fairly easily.
With similar rules against mutation and dependencies you really could parallelize most of an XSLT transformation in a kind of a task based parallelism.
First, fragment the document whole into tasks, segmented at XSLT command and text node boundaries; each task should be assigned a sequential index according to it's position in the document (top to bottom).
Next, scatter the tasks to distinct XSLT processing functions each running on different threads; these processors will all need to be initialized with the same global state (variables, constants, etc...).
Finally, once all the transformations are complete, the controlling thread should gather the results (transformed strings) in index order and assemble them into the finished document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With