I have a service running which periodically checks a folder for a file and then processes it. (Reads it, extracts the data, stores it in sql)
So I ran it on a test box and it took a little longer thaan expected. The file had 1.6 million rows, and it was still running after 6 hours (then I went home).
The problem is the box it is running on is now absolutely crippled - remote desktop was timing out so I cant even get on it to stop the process, or attach a debugger to see how far through etc. It's solidly using 90%+ CPU, and all other running services or apps are suffering.
The code is (from memory, may not compile):
List<ItemDTO> items = new List<ItemDTO>();
using (StreamReader sr = fileInfo.OpenText())
{
while (!sr.EndOfFile)
{
string line = sr.ReadLine()
try {
string s = line.Substring(0,8);
double y = Double.Parse(line.Substring(8,7));
//If the item isnt already in the collection, add it.
if (items.Find(delegate(ItemDTO i) { return (i.Item == s); }) == null)
items.Add(new ItemDTO(s,y));
}
catch { /*Crash*/ }
}
return items;
}
- So I am working on improving the code (any tips appreciated).
But it still could be a slow affair, which is fine, I've no problems with it taking a long time as long as its not killing my server.
So what I want from you fine people is: 1) Is my code hideously un-optimized? 2) Can I limit the amount of CPU my code block may use?
Cheers all
Rather than limit its CPU usage, you'd probably be better off setting it to idle priority, so it'll only run when there's nothing else for the box to do. Others have already mentioned optimization possibilities, so I won't try to get into that part.
Doing the find on the list is an O(n) operation, that means that as the list gets longer it takes longer to search for the items. You could consider putting the items into a HashSet in .NET 4.0/3.5 or use a Dictionary for earlier versions of .NET which can act like an index, if you need the items in the list to maintain the original order you can continue to put them in the list, but use the HashSet/Dictionary to do the checks.
You could also run this code in a BackgroundWorker thread this will help keep the UI responsive while the process is running.
Find on a List is O(n). If the file has 1.6 million rows (ie 1.6 million items), you'll be repeatedly walking a list of 1+ million rows, which will waste a lot of time.
As others have suggested, if you're doing a lot of searching, then, you need a better data structure. One that is designed for faster searches.
If using .NET 3.5, you can use the HashSet collection, which gives you an amortized O(1) for searches. Or Dictionary collection is using .NET 2.0
Next you have to ask yourself, if the file has 1.6 million rows, do you have enough memory? If you do, then parsing the file in memory will be faster than sending it to the database for processing for duplicates, but if you don't have enough memory, then you'll be paging. A lot. (which is what is probably happening now).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With