Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data in XML files: One large file or multiple small ones?

I am currently working on a XML-based CMS that saves data in chunks called "items". These can be used on the website to display content.

Now, at the moment I have one separate XML file for every item. Since most pages on that website use about three to four of these items, a rather small website with e.g. 20 pages has about 100 different items. And therefor the same number of xml files in my /xml/items folder.

Would it be preferable to store all that data in one single items.xml file or is my current approach the better one?

Pro Single File - xml/items.xml

  • Less files (maybe starts to become a performance issue when talking about thousands of items on a larger website.)
  • Less disk access (especially in the administration with a list of all items)

Pro Multiple Files - xml/items/*.xml

  • Faster to access one single item since only one small file needs to be parsed
like image 252
Jörg Avatar asked Aug 21 '09 11:08

Jörg


People also ask

What is considered a large XML file?

For example, if you don't want to go over 10MB of RAM, your XML should not exceed 1MB. Only most recent devices have 1GB of RAM (divided among all apps), older devices had 512MB and even 256MB. 256MB/10 = 25MB.

How Big Should XML file be?

Even though the maximum file size is set to 100 MB, it is still possible to import an XML file larger than 100 MB via P6 Professional.

How does XML store data?

XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data. XML also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

Does XML allow for data storage in a separate file?

With XML, the data can be stored in separate XML files. With a few lines of JavaScript code, you can read an XML file and update the data content of any HTML page.


3 Answers

Many thoughtful responses here already.

Either 1 big file, or many small files, should work just fine. The areas of concern to think about are more likely around administration and maintenance. If its difficult to maintain items because they are in a bunch of different files, then maybe one big file is the answer.

Some thoughts:

  • One big file means that a single mistake (invalid xml) could take down the whole application, while many files would only affect pages using that item(s). Mitigated by not editing data in production.

  • Does each server have its own items file structure? Or are the items located in a single highly available share? The more copies of the data you have laying around, the more likely you'll have data get out of sync on a particular server which might be hard to track down.

  • Whether you choose 1 file, or many files, you can likely solve/abstract any data access (locking, searching, etc) problems in code. The more code you have to write to do things like locking, searching, the more bugs your likely to have to debug.

  • Consider caching items for a period of time to avoid disc access if performance begins to become a problem.

You might want to check out Scott Hanselman's dasBlog blogging engine. I believe it is essentially an xml/text file based content management system that took the many file approach and it might be helpful to review.

like image 198
Zach Bonham Avatar answered Oct 19 '22 10:10

Zach Bonham


I think your current approch is the better of the two alternatives. Given your users use an interface you create to edit the files they will not be searching for files in a directory with many files anyway.

Given what it takes to corrupt a file, an advantage of many files, is you will not get one big hit, but only a hit on a single file. Locking is also better - as one file at a time is locked for writing, instead of the complete 'master XML file'.

like image 27
Thies Avatar answered Oct 19 '22 10:10

Thies


Will your user work with the XML files directly or is simply a way to store the data?

If the latter, it is a technical issue and disk access and parsing speed are relevant issues.

If the former, the most important question is what makes most sense for the user. You can then work around the technical issues with caching and such. So assuming the user works directly with the XML files, you have to ask yourself if it's helping or hindering your user to have multiple files or a single file. If each item describes an individual component, and there are few or no relations with other items, I would put them in separate files. If you create a single file with lots of unrelated items, the user will spend much time searching for the relevant item. If you have multiple files, he can use the file name to immediately select the right one.

like image 33
beetstra Avatar answered Oct 19 '22 10:10

beetstra