Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quadratic runtime for Delphi XML (MSXML) routines

At work, we're using a XML log file. Each log message is a <message> block with <date> and <time> subnodes, there are <submessage> blocks, <table> constructs and so on and the log file can be transformed to a localized HTML using some Delphi processing and XSLT later.

For medium sized log files (around 2 MB), we got performance problems (up to a minute for loading the XML and doing some basic manipulations) and I could reduce them to a test project like this (EDIT: Updated the code and added measurements):

procedure TForm1.PrepareTest(MessageCount : integer);
var
  XML : IXMLDocument;
  i : integer;
begin
  XML := NewXMLDocument;
  XML.DocumentElement := XML.CreateNode('root');
  for i := 1 to MessageCount do
  begin
    XML.DocumentElement.AddChild('message').Text := 'Test Text';
  end;
  XML.SaveToFile(XML_NAME);
end;

procedure TForm1.XMLTest;
var
  StartTime : Cardinal;
  XML : IXMLDocument;
begin
  StartTime := GetTickCount();
  XML := NewXMLDocument;
  XML.LoadFromFile(XML_NAME);
  Memo1.Lines.Add('Node count: ' + IntToStr(XML.DocumentElement.ChildNodes.Count));
  Memo1.Lines.Add('Time: ' + FloatToStr((GetTickCount() - StartTime) / 1000) + ' seconds');
end;

This results in the following time measurements (node count is increased by 25% per column, all times in milliseconds):

Node count      8000    10000   12500   15625   19531   24413   30516   38145   47681
Base test time  484     781     1140    1875    2890    4421    6734    10672   16812
Variation 1                             32      47      62      78      78      141
Variation 2     2656    3157    3906    5015    6532    8922    12140   17391   24985
  (delta Base)   2172    2376    2766    3140    3642    4501    5406    6719    8173

Note the two variations, the first is LoadFromFile only and the second one is additionally adding 10000 more nodes to the beginning (!) of the XML like PrepareTest does, which is worst case but looking at the delta to the base test, even this doesn't show quadratic effects. Also note that counting the nodes can be replaced with any other operation, so it looks like there is some delayed initialization/validation of the XML file involved that causes the problem and any manipulation afterwards shows expected behaviour.

Memory usage is not high, the last test case (47681 nodes) has a peak memory usage of 39 MB, its XML filesize is 1.3 MB.

The first thing that is done after loading the XML (e.g. reading or writing some nodes or accessing node count) is slow and it shows quadratic runtime behaviour, so any log file above 10 MB is unusable.

We already solved the performance problems along with some other things by parsing small chunks of 100 messages, and I'm aware that the Delphi XML routines are not suited/overkill for this use case - using a different XML library would most likely stop the performance problems. So I'm not asking for a solution for the problem (although it would be interesting to know if the problem can be solved without using a different XML library).

My question is: What is the reason for the quadratic runtime behaviour of the Delphi XML routines, respectively MSXML? I can't imagine things that would lead to this in XML loading/parsing/validating, besides really "stupid" things like managing the nodes in a linked list instead of a tree, but I might be overlooking something, perhaps DOM related.

like image 648
schnaader Avatar asked Jan 16 '13 19:01

schnaader


1 Answers

I agree with mj2008 that XML isn't a good fit for logging. That said, this problem and large XML files in general, could get processed faster by using SAX, it throws events while parsing the incoming stream of XML data, which lets you process the items as they are read from disk, in effect mitigating the exponentiality of loading it all in memory before handing it to XSLT.

I regret I have not done SAX in Delphi (yet), but I suspect the hardest part is implementing the required SAX interfaces (e.g. ISAXContentHandler), but Delphi has TInterfacedObject and TAutoObject and other just for that.

like image 123
Stijn Sanders Avatar answered Nov 15 '22 15:11

Stijn Sanders