Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large XML Files and Pagination, is it possible?

The problem

When opening very large XML files locally, on your machine, it's almost a certainty that it will take an age for that file to open - it can often mean your computer locks down because it thinks it's not responding.

This is an issue if you serve users XML backups of rather complex databases or systems they use - the likehood of them being able to open large backups, let alone use them, is slim.

Is pagination possible?

I use XSLT to present readable backups to users. In this same way, would it be possible to pull only a page at a time of data, to prevent the entire file from being read in one go, thus causing the issues above.

I imagine the answer is simply a no - but I would like to know if anyone else has seen the same issues and resolved them.

Note: This is on a local machine only, it must not require an internet connection. JavaScript can be used if it makes things easier.

like image 847
jakeisonline Avatar asked Jan 06 '10 15:01

jakeisonline


2 Answers

Pagination with XSLT is possible, but will probably not lead to the desired results: For XSLT to work, the whole XML document must be parsed into a DOM tree.

What you could do, is experiment with streaming transformations: http://stx.sourceforge.net/

Or you could preprocess the large XML file to cut it up into smaller bits before processing with XSLT. For this I'd use a command line tool like XMLStarlet

like image 138
chiborg Avatar answered Sep 21 '22 23:09

chiborg


Right on, very good question!

XSLT implementations I know require DOM, so they are bound to access the entire document (although it could perhaps be done in a lazy fashion)

Anyway, you should take a look at VTD-XML: http://vtd-xml.sourceforge.net/

The latest SAXON XSLT processor also supports rudimentary support for what is called "Streaming XSLT". Read about that here: http://www.saxonica.com/documentation/index/intro.html

That said, database backups are probably not the right use case for XML. If you have to deal with XML database backups, I would try to get away from those as fast as possible. Same for logs - a linear process should work by simply appending things. I mean, it would be even better of XML would allow a forest as top level structure, but I think that is never going to happen.

like image 24
Roland Bouman Avatar answered Sep 19 '22 23:09

Roland Bouman