I am using LinqToSQL to process data from SQL Server to dump it into an iSeries server for further processing. More details on that here.
My problem is that it is taking about 1.25 minutes to process those 350 rows of data. I am still trying to decipher the results from the SQL Server Profiler, but there are a TON of queries being run. Here is a bit more detail on what I am doing:
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
var vehicles = from a in db.EquipmentMainGenerals
join b in db.EquipmentMainConditions on a.wdEquipmentMainGeneralOID equals b.wdEquipmentMainGeneralOID
where b.Retired == null
orderby a.VehicleId
select a;
et = new EquipmentTable[vehicles.Count()];
foreach (var vehicle in vehicles)
{
// Move data to the array
// Rates
GetVehcileRates(vehicle.wdEquipmentMainGeneralOID);
// Build the costs accumulators
GetPartsAndOilCosts(vehicle.VehicleId);
GetAccidentAndOutRepairCosts(vehicle.wdEquipmentMainGeneralOID);
// Last Month's Accumulators
et[i].lastMonthActualGasOil = GetFuel(vehicle.wdEquipmentMainGeneralOID) + Convert.ToDecimal(oilCost);
et[i].lastMonthActualParts = Convert.ToDecimal(partsCost);
et[i].lastMonthActualLabor = GetLabor(vehicle.VehicleId);
et[i].lastMonthActualOutRepairs = Convert.ToDecimal(outRepairCosts);
et[i].lastMonthActualAccidentCosts = Convert.ToDecimal(accidentCosts);
// Move more data to the array
i++;
}
}
The Get methods all look similar to:
private void GetPartsAndOilCosts(string vehicleKey)
{
oilCost = 0;
partsCost = 0;
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
try
{
var costs = from a in db.WorkOrders
join b in db.MaterialLogs on a.WorkOrderId equals b.WorkOrder
join c in db.Materials on b.wdMaterialMainGeneralOID equals c.wdMaterialMainGeneralOID
where (monthBeginDate.Date <= a.WOClosedDate && a.WOClosedDate <= monthEndDate.Date) && a.EquipmentID == vehicleKey
group b by c.Fuel into d
select new
{
isFuel = d.Key,
totalCost = d.Sum(b => b.Cost)
};
foreach (var cost in costs)
{
if (cost.isFuel == 1)
{
oilCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
else
{
partsCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
}
}
catch (InvalidOperationException e)
{
oilCost = 0;
partsCost = 0;
}
}
return;
}
My thinking here is cutting down the number of queries to the DB should speed up the processing. If LINQ does a SELECT for every record, maybe I need to load every record into memory first.
I still consider myself a beginner with C# and OOP in general (I do mostly RPG programming on the iSeries). So I am guessing I am doing something stupid. Can you help me fix my stupidity (at least with this problem)?
Update: Thought I would come back and update you on what I have discovered. It appears like the database was poorly designed. Whatever LINQ was generating in the background it was highly inefficient code. I am not saying the LINQ is bad, it just was bad for this database. I converted to a quickly thrown together .XSD setup and the processing time went from 1.25 minutes to 15 seconds. Once I do a proper redesign, I can only guess I'll shave a few more seconds off of that. Thank you all for you comments. I'll try LINQ again some other day on a better database.
The time taken just to read data from an array or a dataset should be the same. The difference (although as Cor points out, the difference may be very small) is how long it takes to find the data.
A STRUCT is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an ARRAY of STRUCT elements.
There are a few things that I spot in your code:
select
. LINQ to SQL will analyze this and retrieve less data from your database. Such a select might look like this: select new { a.VehicleId, a.Name }
GetPartsAndOilCosts
can be optimized by putting the calculation cost.totalCost * (1 + OVERHEAD_RATE)
in the LINQ query. This way the query can be executed in the database completely, which should make it much faster.Count()
on the var vehicles
query, but you only use it for determining the size of the array. While LINQ to SQL will make a very efficient SELECT count(*)
query of it, it takes an extra round trip to the database. Besides that (depending on your isolation level) the time you start iterating the query an item could be added. In that case your array is too small and an ArrayIndexOutOfBoundsException
will be thrown. You can simply use .ToArray()
on the query or create a List<EquipmentTable>
and call .ToArray()
on that. This will normally be fast enough especially when you only have only 380 items in this collection and it will certainly be faster than having an extra roundtrip to the database (the count).A little extra explanation for point #1. What you're doing here is a bit like this:
var query = from x in A select something;
foreach (var row in query)
{
var query2 = from y in data where y.Value = row.Value select something;
foreach (var row2 in query2)
{
// do some computation.
}
}
What you should try to accomplish is to remove the query2
subquery, because it is executing on each row of the top query. So you could end up with something like this:
var query =
from x in A
from y in B
where x.Value == y.Value
select something;
foreach (var row in query)
{
}
Of course this example is simplistic and in real life it gets get pretty complicated (as you’ve already noticed). In your case also because you've got multiple of those 'sub queries'. It can take you some time to get this right, especially with your lack of knowledge of LINQ to SQL (as you said yourself).
If you can't figure it out, you can always ask again here at Stackoverflow, but please remember to strip your problem to the smallest possible thing, because it's no fun to read over someone's mess (we're not getting paid for this) :-) Good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With