Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensure "Reasonable" queries only

In our organization we have the need to let employees filter data in our web application by supplying WHERE clauses. It's worked great for a long time, but we occasionally run into users providing queries that require full table scans on large tables or inefficient joins, etc.

Some clown might write something like:

select * from big_table where
Name in (select name from some_table where name like '%search everything%')
or name in ('a', 'b', 'c')
or price < 20
or price > 40
or exists (select 1 from some_other_table where col1 + col2 + col3 = 4)
or exists (select 1 from table_a, table+b)

Obviously, this is not a great way to query these tables with computed values, non-indexed columns, lots of OR's and an unrestricted join on table_a and table_b.

But for a user, this may make total sense.

So what's the best way, if any, to allow internal users to supply a query to the database while ensuring that it won't lock a dozen tables and hang the webserver for 5 minutes?

I'm guessing that's a programmatic way in c#/sql-server to get the execution plan for a query before it runs. And if so, what factors contribute to cost? Estimated I/O cost? Estimated CPU cost? What would be reasonable limits at which to tell the user that his query's no good?

EDIT: We're a market research company. We have thousands of surveys, each with their own data. We have dozens of researchers that want to slice that data in arbitrary ways. We have tools to let them construct "valid" filters using a GUI, but some "power users" want to supply their own queries. I realize this isn't standard or best practice, but how else can I let dozens of users query tables for the rows they want using arbitrarily complex conditions and ever-changing conditions?

like image 647
Jody Powlette Avatar asked Jan 06 '10 19:01

Jody Powlette


People also ask

How many queries can MySQL handle?

MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing.


3 Answers

The premise of your question states:

In our organization we have the need to let employees filter date in our web application by supplying WHERE clauses.

I find this premise to be flawed on its face. I can't imagine a situation where I would allow users to do this. In addition to the problems you have already identified, you are opening yourself up to SQL Injection attacks.

I would highly recommend reassessing your requirements to see if you can't build a safer, more focused way of allowing your users to search.

However, if your users really are sophisticated (and trusted!) enough to be supplying WHERE clauses directly, they need to be educated on what they can and can't submit as a filter.

like image 147
Phil Sandler Avatar answered Oct 08 '22 06:10

Phil Sandler


You can try using the following:

SET SHOWPLAN_ALL ON
GO
SET FMTONLY ON
GO
<<< Your SQL code here >>>
GO
SET FMTONLY OFF
GO
SET SHOWPLAN_ALL OFF
GO

Then you can parse through what you've got. As to where to draw the line on various things, that's going to take some experience. There are some things to watch for, but nothing that is cut and dried. It's often more of an art to examine the query plans than a science.

As others have pointed out though, I think that your problem goes deeper than the technology implications. The fact that you let unqualified people access your database in such a way is the underlying problem. From past experience, I often see this in companies where they are too lazy or too inexperienced to properly capture their application's requirements. I'm not saying that this is necessarily the case with your corporate environment, but that's what I've seen.

like image 41
Tom H Avatar answered Oct 08 '22 04:10

Tom H


In addition of trying to control what the users enter (which is a loosing battle, there will always be a new hire that will come up with an immaginative query), I'd look into Resource Governor, see Managing SQL Server Workloads with Resource Governor. You put the ad-hoc queries into a separate pool and cap the allocated resources. This way you can mitigate the problem by limiting the amount of damage a bad query can do to other tasks.

And you should also consider giving access to the data by other means, like Power Pivot and let users massage their data as hard as they want on their own Excel. Business power users love that, and the impact on the transaciton processign server is minimal.

like image 36
Remus Rusanu Avatar answered Oct 08 '22 05:10

Remus Rusanu