Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using an IN clause with LINQ-to-SQL's ExecuteQuery

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.

What's the right way to do this?

NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.

like image 871
Benjamin Pollack Avatar asked Dec 17 '22 02:12

Benjamin Pollack


1 Answers

Table-Valued Parameters

On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.

The bad way: Dynamic SQL

One way to pass this list in was to use dynamic SQL.

 IEnumerable<long> assetIDs = GetAssetIDs();
 var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
 return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);

This is a very bad thing to do:

  1. Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
    Since we are usually just concatenating numbers together, this is highly unlikely, but if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT ' and our site is dead.
  2. Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
  3. Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.

However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.

The good way: Table-Valued Parameters

SQL Server 2008 adds a new ability: users can define a table-valued database type. Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.

We've defined three types: varchar_array, int_array, and bigint_array.

CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)

Both stored procedures and programmatically defined SQL queries can use these table-valued types.

  IEnumerable<long> assetIDs = GetAssetIDs();
  return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
      "SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM @AssetIDs)", 
      new Parameter("@AssetIDs", assetIDs));

Advantages

  1. Can be used in both stored procedures and programmatic SQL without much effort
  2. Not vulnerable to SQL injection
  3. Cacheable, stable queries
  4. Does not lock the schema table
  5. Not limited to 8k of data
  6. Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
  7. "typical use" statistics can be derived by the query analyzer, which can lead to even better performance.

Disadvantages

  1. Only works on SQL Server 2008 and above.
  2. Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server. Further investigation of this rumor is ongoing.

Further reading

This article is a great resource to learn more about TVP.

like image 186
Stefan Rusek Avatar answered Dec 27 '22 11:12

Stefan Rusek