Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Complex WHERE clauses with a PHP Query Builder

There are several ActiveRecord styled query builder libraries out there. Some are stand alone and some come built into frameworks. However, they really have trouble with WHERE and HAVING clauses when it comes to complex SQL. Setting other databases aside - I am trying to come up with a MySQL and PostgreSQL compatible WHERE() method that could fix these current method downfalls.

What follows is a long list of ideas and examples showing the best I could come up with so far. However, I can't seem to solve all of the use cases and I feel my partial solution is sloppy. Anyone that can answer with something that solves all of these problems will not only answer this question - but a will be responsible for fixing a problem that has hunted PHP implementations for several years now.

Common Operators

    =   Equal
    <>  Not Equal
    >   Greater Than
    <   Less Than
    >=  Greater Than Or Equal
    <=  Less Than Or Equal
    BETWEEN between values on right 
    NOT logical NOT 
    AND logical AND 
    OR  logical OR

Example Where clauses

SELECT ... FROM table...
    WHERE column = 5
    WHERE column > 5
    WHERE column IS NULL
    WHERE column IN (1, 2, 3)
    WHERE column NOT IN (1, 2, 3)
    WHERE column IN (SELECT column FROM t2)
    WHERE column IN (SELECT c3 FROM t2 WHERE c2 = table.column + 10)
    WHERE column BETWEEN 32 AND 34
    WHERE column BETWEEN (SELECT c3 FROM t2 WHERE c2 = table.column + 10) AND 100
    WHERE EXISTS (SELECT column FROM t2 WHERE c2 > table.column)

There are many common ActiveRecord formats that the where() clause uses in the different current libraries.

$this->db->where(array('session_id' => '?', 'username' => '?'));
$this->db->fetch(array($id, $username));

// vs with is_int($key)
$this->db->where(array('session_id', 'username'));
$this->db->fetch(array($id, $username));

// vs with is_string($where)
$this->db->where('session_id', '?');
$this->db->where('username');
$this->db->fetch(array($id, $username));

// vs with is_array($value)
$this->db->where('session_id', '?');
$this->db->where('username', array('Sam', 'Bob'));
$this->db->fetch(array($id));

Here is the final format that I have so far. It should handle grouping (...) AND (...) as well as prepared statement bound params ("?" & ":name").

function where($column, $op = '=', $value = '?', $group = FALSE){}


// Single line

$this->db->where('column > 5');
$this->db->where('column IS NULL');

// Column + condition

$this->db->where('column', '=');
// WHERE column = ?     (prepared statement)
$this->db->where('column', '<>');
// WHERE column <> ?    (prepared statement)

// Column + condition + values

$this->db->where('column', '=', 5);
// // WHERE column = 5
$this->db->where('column', 'IN', '(SELECT column FROM t2)');
// WHERE column IN (SELECT column FROM t2)
$this->db->where('column', 'IN', array(1,2,3));
// WHERE column IN (1, 2, 3)
$this->db->where('column', 'NOT IN', array(1,2,3));
// WHERE column NOT IN (1, 2, 3)

// column + condition + values + group
$this->db->where(
    array(
        array('column', '<', 20), 
        array('column', '>', 10)
    ),
    NULL,
    NULL,
    $group = TRUE
);
// WHERE (column < 20 AND column > 10)

:UPDATE:

Over the course of my question I came to realize that WHERE and HAVING conditions only get more complex the deeper you go. Trying to abstract even 80% of the features would result in a massive library just for WHERE and HAVING. As Bill points out, that just isn't reasonable for a scripting language like PHP.

The solution is just to hand craft the WHERE portion of your query. As long as you use " around your columns you can use the same WHERE query in Postgre, SQLite, and MySQL since they use almost the same SQL syntax. (For MySQL you must str_replace() them with a tick`).

There comes a point where abstraction hurts more than it helps, WHERE conditions are one such place.

like image 296
Xeoncross Avatar asked Dec 17 '09 23:12

Xeoncross


3 Answers

I worked quite a bit on the Zend_Db library, which includes a PHP class for constructing SQL queries. I decided to punt on trying to handle every imaginable SQL syntax in WHERE and HAVING clauses, for several reasons:

  • PHP is a scripting language that parses and compiles code on every request (unless you use a bytecode cache). So the PHP environment is sensitive to bulky code libraries -- more so than Java or C# or Python or what have you. It's therefore a high priority to keep libraries as lean as we can.

    All of the Zend_Db library I worked on was about 2,000 lines of PHP code. By contrast, Java Hibernate is on the order of 118K lines of code. But that's not so much of an issue since a Java library is precompiled and doesn't have to be loaded on every request.

  • SQL expressions follow a generative grammar that is more compact, and easier to read and maintain that any of the PHP-based construction you showed. Learning the SQL expression grammar is far easier than learning an API that can simulate it. You end up supporting a "simplified grammar." Or else you start out that way, and find yourself coerced by your user community into Feature Creep until your API is unusably complex.

  • To debug an application that used such an API, you'd inevitably need access to the final SQL expression, so it's about the leakiest abstraction you can have.

  • The only advantage to using a PHP-based interface for SQL expressions would be that it assists code-completion in smart editors and IDE's. But when so many of the operators and operands use string constants like '>=', you spoil any code-completion intelligence.


update: I just read a good blog article "A Farewell to ORMs." The writer, Aldo Cortesi, suggests using the SQL Expression Language in Python's SQLAlchemy. Syntactic sugar and operator overloading that is standard in Python (but not supported in PHP) make this a very effective query-generating solution.

You might also look at Perl's DBIx::Class, but it ends up being pretty ugly.

like image 166
Bill Karwin Avatar answered Oct 05 '22 22:10

Bill Karwin


This is part of my ActiveRecord class, I don't handle sub queries (I don't even bother):

public function Having($data, $operator = 'LIKE', $merge = 'AND')
{
    if (array_key_exists('query', $this->sql) === true)
    {
        foreach ($data as $key => $value)
        {
            $this->sql['having'][] = ((empty($this->sql['having']) === true) ? 'HAVING' : $merge) . ' ' . $this->Tick($key) . ' ' . $operator . ' ' . $this->Quote($value);
        }
    }

    return $this;
}

public function Where($data, $operator = 'LIKE', $merge = 'AND')
{
    if (array_key_exists('query', $this->sql) === true)
    {
        foreach ($data as $key => $value)
        {
            $this->sql['where'][] = ((empty($this->sql['where']) === true) ? 'WHERE' : $merge) . ' ' . $this->Tick($key) . ' ' . $operator . ' ' . $this->Quote($value);
        }
    }

    return $this;
}

One other thing that you may consider is having a customHaving() and customWhere() methods.

like image 33
Alix Axel Avatar answered Oct 05 '22 22:10

Alix Axel


I know this is an extremely old posting, but I'm going to reply to it anyway, because I'm in the process of developing my own classes to meet similar needs to what the question asks.

After looking into it, I've found that the problem with Zend-Db and other such engines is that they try to be all things to all people. To appeal to the largest audience, they need to offer the most general functionality, which becomes their own undoing as far as I can see (and as expertly explained by Bill Karwin).

One of the most obvious over-complications that many engines do, is to confuse the generation of SQL code with its execution (making it easier to write dirty SQL). In many applications, it's a good idea to separate both of these quite explicitly, encouraging the developer to think about injection attacks etc.

In building an SQL engine, the first thing to try to do, is to limit the scope of what SQL your engine can produce. You should not allow it to produce a select * from table for example; the engine should require the developer to define each select, where and having column explicitly. As another example, it's often useful to require every column to have an alias (normally not required by the database).

Notice that limiting the SQL in these ways does not limit what you can actually get out of the database. Yes, it makes the up-front coding more verbose on occasion, but it also makes it more structured, and lets you dump hundreds of lines of library-code which were only ever there in the first place to deal with complicated exceptions and provide (ahem) "flexibility".

The libraries I've written so far are about 600 lines of code (~170 lines of which is error-handling). It deals with ISO joins, sub-statements (in the SELECT, FROM and WHERE clauses), any 2-sided comparison clause, IN, EXISTS and BETWEEN (with sub-statements in the WHERE clause). It also implicitly creates bindings, instead of directly injecting values into the SQL.

Limitations (other than those already mentioned): the SQL is written expressly for Oracle. Untested on any other database platform.

I'm willing to share the code, assuming that any improvements are sent back.

As an example of what the libraries let me produce, I hope that the following is simple enough to be intuitive, while also being complex enough to show the expandability potential:

<?php
$substmt = new OraSqlStatement;
$substmt->AddVarcharCol ('value','VALUE')
        ->AddVarcharCol ('identity','UID',false)
        ->AddVarcharCol ('type','info_type',false)
        ->AddFrom ('schemaa.user_propertues','up')
        ->AddWhere ('AND')
        ->AddComparison ('UID', '=', 'e.identity', 'column')
        ->AddComparison ('info_type', '=', 'MAIL_ADDRESS');

$stmt = new OraSqlStatement;
$stmt->AddVarcharCol ('company_id', 'Company')
     ->AddVarcharCol ('emp_no',     'Emp Id')
     ->AddVarcharCol ('person_id',  'Pers Id')
     ->AddVarcharCol ('name',       'Pers Name')
     ->AddDateCol ('employed_date', 'Entry Date')
     ->AddDateCol ('leave_date', 'Leave Date')
     ->AddVarcharCol ('identity',   'User Id')
     ->AddVarcharCol ('active', 'Active')
     ->AddVarcharCol ($substmt, 'mail_addy')
     ->AddFrom ('schemab.employee_tab', 'e')
     ->AddFrom ('schemaa.users_vw','u','INNER JOIN','u.emp_no=e.emp_number')
     ->AddWhere ('AND')
     ->AddComparison ('User Id', '=', 'my_user_id')
     ->AddSubCondition ('OR')
     ->AddComparisonNull ('Leave Date', false)
     ->AddComparisonBetween ('Entry Date', '2011/01/01', '2011/01/31');

echo $stmt->WriteSql();
var_dump($stmt->GetBindArray());
?>

Which produces:

SELECT 
  company_id "Company", emp_no "Emp Id", person_id "Pers Id", name "Pers Name", 
  employed_date "Entry Date", leave_date "Leave Date", identity "User Id", active "Active", 
  ( SELECT value "VALUE" FROM schemaa.user_propertues up 
    WHERE  upper(identity) = upper(e.identity)
      AND  upper(TYPE) = upper (:var0) 
  ) "mail_addy" 
FROM 
  schemab.employee_tab e 
      INNER JOIN schemaa.users_vw u ON u.emp_no = e.emp_number 
WHERE 
        upper (identity) = upper (:var1)
  AND ( leave_date IS NOT NULL OR
        employed_date BETWEEN to_date (:var2,'YYYY/MM/DD') AND to_date (:var3,'YYYY/MM/DD') 
      )

Along with the bind array:

array
  0 => string 'MAIL_ADDRESS' (length=12)
  1 => string 'my_user_id' (length=10)
  2 => string '2011/01/01' (length=10)
  3 => string '2011/01/31' (length=10)
like image 38
cartbeforehorse Avatar answered Oct 05 '22 22:10

cartbeforehorse