I recently started a WPF application. I connected that to a BaseX (XML-based) database and retrieved about one million entries from it. I wanted to iterate over the entries, calculate something for each entry and then write that back to the database:
IEnumerable<Result> resultSet = baseXClient.Query("...", "database");
foreach (Result result in resultSet)
{
...
}
The problem: The inside of the foreach is never reached. the Query() method returns pretty fast, but when the foreach is reached C# seems to do SOMETHING with the collection, the code is not continuing for a very very long time (at least 10 minutes, never let it run any longer). What's going on here? I tried to limit the number of items retrieved. When retrieving 100.000 results, the same thing occurs but the code continues after about 10-20 seconds. When retrieving the full one million results, C# seems to be stuck forever...
Any ideas? regards
Edit: Why this is happening
As some of you pointed out, the reason for this behavior seems to be that the query is actually only evaluated when MoveNext()
on the Enumerator inside the Enumerable is called. My database seems unable to return one value at a time, but instead returns the entire one million dataset at once. I will try to switch to another database (Apache Lucene, if possible, as it has good fulltext search support) and edit this post to let you know if it changed anything.
PS: Yes, I am aware that one million results is a lot. This is not meant for live usage, it is just a step for preparing the data. While I didn't expect the code to run in a few seconds, I was still surprised to see SUCH poor performance in the database.
Edit: The Solution So I migrated the XML database to Apache Lucine. Works like a charm! Of course Lucine is a text-based database that is not suitable for every use case, but for me it worked wonders. Can iterate over one million entries in a few seconds, one entry per loop is fetched - works extremly well!
Let me quess - you are NOT loading the data when youcreate the rsultSet, but when it is first accessed (delayed execution), and loading one million entries you just take a lot of time to deserialize them into memory.
Welcome to the inefficiences of XML databases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With