I need to manipulate 100,000 - 200,000 records.
I am thinking of using LINQ (to SQL) to do this.
I know from experience that filtering dataviews is very slow.
So how quick is LINQ?
Can you please tell me your experiences and if it is worth using, or would I be better off using SQL stored procedures (heavy going and less flexible)?
Within the thousands of records I need to find groups of data and then process them, each group has about 50 records.
LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code.
We can see right away that LINQ is a lot slower than raw SQL, but compiled LINQ is a bit faster. Note that results are in microseconds; real-world queries may take tens or even hundreds of milliseconds, so LINQ overhead will be hardly noticeable.
No, LINQ iterators are not and will never be faster than foreach . Also, List. Exists is not a LINQ method.
LINQ. requires more computation time and creates more garbage because of the boxing that goes on behind the scenes. For context, dotnetperls did a loop vs LINQ test over some data and found that LINQ was almost 10X slower.
LINQ to SQL translates your query expression into T-SQL, so your query performance should be exactly the same as if you sent that SQL query via ADO.NET. There is a little overhead I guess, to convert the expression tree for your query into the equivalent T-SQL, but my experience is that this is small compared with the actual query time.
You can of course find out exactly what T-SQL is generated, and therefore make sure you have good supporting indexes.
The primary difference from DataViews is that LINQ to SQL does not bring all the data into memory and filter it there. Rather it gets the database to do what it's good at and only brings the matching data into memory.
It depends on what you're trying to do. LINQ has been very fast for me to pull data from the database, but LINQ-to-SQL does directly translate your request to SQL to run it. However, there are times that I've found using Stored Procedures is better in some circumstances.
For instance, I have some data that I need to query which involves several tables, and fairly intense keys. With LINQ, and the relatively inflexibility of LINQ to customize queries, these queries would take several minutes. By hand-tweaking the SQL (namely, by placing 'WHERE'-type arguments in JOIN's in order to minimize the data intensity of the JOIN), I was able to drastically improve performance.
My advice, use LINQ wherever you can, but don't be afraid to go the Stored Procedure route if you determine that the SQL generated by LINQ is simply too slow, and the SQL can be hand-tweaked easily to accomplish what you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With