Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GPU Accelerated XML Parsing

I need to improve the performance of a piece of software that parses XML files and adds their contents to a large SQL Database. I have been trying to find information about whether or not it is possible to implement this on a GPU. My research regarding both CUDA and OpenCL has left me with out any clear answers beyond the fact that software can be developed in C/C++, FORTRAN and many other languages using compiler directives to enable GPU processing. This leads me to ask this question: Do I actually need an API or library written for GPU acceleration, or would a program written in C/C++ using a standard XML Parsing library and compiled with the compiler directives for CUDA/OpenCL automatically run the XML library functions on the GPU?

like image 724
Catachan Avatar asked Jul 25 '13 21:07

Catachan


3 Answers

In general, GPU are not suited for XML processing acceleration...GPU are only great if the intended task has massively parallelism to exploit the large number of GPU processing units.. XML processing on the other hand is largely a single thread state machine transitional type of job.

like image 170
vtd-xml-author Avatar answered Sep 21 '22 13:09

vtd-xml-author


I actually don't see any sense in parsing XML on GPU. GPU architecture is focused on massive floating point numbers calculations and not operations like text processing. I think it is much better to use CPU and split XML parsing between threads to make use of multiple cores. Using GPU in such application is in my opinion overkill.

like image 41
Arkadiusz Wojcik Avatar answered Sep 18 '22 13:09

Arkadiusz Wojcik


First look at the structure of your xml. Following this link you can find criteria for XML structure suitable for parallel processing. Parallel XML Parsing in Java

If your xml structure is parallel-processable, then several ideas:

As i know, XML parsing needs stack structure to remember current position in the tree and verify proper opening and closing of nodes.

Stack structure can be represented as an 1-dimensional array with stack pointer. Stack pointer contains position of the stack top element in the array

They say that you can store arrays in 1D textures (max. 4,096 elements). Or in 2D textures (max. 16,777,216 = 4,096x4,096 elements) ... Look at following link for more https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter33.html

if you assign separate floating point number to each unique element name, then you can store elements as numbers

if you take the input text as an array of ascii/utf-8 codes, then why not store them as an array of floating point numbers?

Last thing important to consider using GPU is what is the output structure.

If you need e.g. table row of fixed length columns, then it is only about how to represent such structure in 1D or 2D array of float numbers

When you're sure about previous points and GPU is the right for you, then just write functions to convert your data to textures and textures back to your data

And then of course the whole xml parser...

I never tried programming with GPU at all, but seems very soon to me to say that something is impossible ...

Someone should be the first to build the whole algorithm and try whether it is efficient to use GPU or not

like image 30
Jirka Jr. Avatar answered Sep 20 '22 13:09

Jirka Jr.