Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataTables: Speed up server side processing with multiple tables, grouping, and html content?

I'm using datatables to display data from multiple mySQL tables (7 of them actually). Currently there really isn't much data, but I see "Showing 1 to 7 of 7 entries (filtered from 642,660,480 total entries)." and it takes 20 seconds to display only 7 entries. Once I really start adding a lot of content to the database, I'm sure this is going to be unusable.

I'm sure that there are better ways to accomplish what I'm trying to do, but this is the only way I've been able to get it working.

Here is my server side script:

$q1 = "'";
$q2 = '"';

$order_id = "CONCAT( ".$q2."<input type='hidden' id='order_id' value='".$q2.", o.id, ".$q2."'><a href='order_details.php?id=".$q2.", o.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", o.id )";
$patient_name = "CONCAT( ".$q2."<input type='hidden' id='patient_name' value='".$q2.", p.first_name, ' ', p.last_name, ".$q2."'><input type='hidden' id='patient_id' value='".$q2.", p.id, ".$q2."'><input type='hidden' id='patient_ssn' value='".$q2.", p.ssn, ".$q2."'><a href='patient_details.php?id=".$q2.", p.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", p.first_name, ' ', p.last_name )";
$doc_name = "CONCAT( ".$q2."<input type='hidden' id='doctor_name' value='".$q2.", d.first_name, ' ', d.last_name, ".$q2."'><input type='hidden' id='doctor_id' value='".$q2.", d.id, ".$q2."'><a href='doctor_details.php?id=".$q2.", d.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", d.first_name, ' ', d.last_name )";
$order_date = "FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y')";
$tests = "GROUP_CONCAT(t.name SEPARATOR ', ')";

$aColumns = array($order_id, $order_date, $doc_name, $patient_name, $tests, 'o.status');

/* Indexed column (used for fast and accurate table cardinality) */
$sIndexColumn = "o.id";

/* DB table to use */
$sTable = "`orders` o, `patients` p, `doctors` d, `tests_ordered` tst, `tests` t, `users` u, `events` e";

$sWhere = "WHERE p.id = o.patient_id AND d.id = o.doctor_id AND tst.order_id = o.id AND t.id = tst.test_id AND u.username = o.assigned_username AND e.event_id = o.event_id";
$order_status = isset($_GET['status']) ? $_GET['status'] : 'all';
if($order_status == 'all'){

}else{
    $sWhere .= " AND (o.status='Complete' OR o.status='$order_status')";
}

$sGroupBy =  "GROUP BY o.id";

/* Database connection information */
$gaSql['user']       = DB_USER;
$gaSql['password']   = DB_PASSWORD;
$gaSql['db']         = DB_NAME;
$gaSql['server']     = DB_SERVER;

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * If you just want to use the basic configuration for DataTables with PHP server-side, there is
 * no need to edit below this line
 */

/* 
 * MySQL connection
 */
$gaSql['link'] =  mysql_pconnect( $gaSql['server'], $gaSql['user'], $gaSql['password']  ) or
    die( 'Could not open connection to server' );

mysql_select_db( $gaSql['db'], $gaSql['link'] ) or 
    die( 'Could not select database '. $gaSql['db'] );

/* 
 * Paging
 */
$sLimit = "";
if ( isset( $_GET['iDisplayStart'] ) && $_GET['iDisplayLength'] != '-1' )
{
    $sLimit = "LIMIT ".mysql_real_escape_string( $_GET['iDisplayStart'] ).", ".
        mysql_real_escape_string( $_GET['iDisplayLength'] );
}

/*
 * Ordering
 */
if ( isset( $_GET['iSortCol_0'] ) )
{
    $sOrder = "ORDER BY  ";
    for ( $i=0 ; $i<intval( $_GET['iSortingCols'] ) ; $i++ )
    {
        if ( $_GET[ 'bSortable_'.intval($_GET['iSortCol_'.$i]) ] == "true" )
        {
            $sOrder .= $aColumns[ intval( $_GET['iSortCol_'.$i] ) ]."
                ".mysql_real_escape_string( $_GET['sSortDir_'.$i] ) .", ";
        }
    }

    $sOrder = substr_replace( $sOrder, "", -2 );
    if ( $sOrder == "ORDER BY" )
    {
        $sOrder = "";
    }
}

/* 
 * Filtering
 * NOTE this does not match the built-in DataTables filtering which does it
 * word by word on any field. It's possible to do here, but concerned about efficiency
 * on very large tables, and MySQL's regex functionality is very limited
 */

if ( $_GET['sSearch'] != "" )
{
    $sWhere .= " AND (";
    for ( $i=0 ; $i<count($aColumns) ; $i++ )
    {
if($i!=4){ //skip tests column
        $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string( $_GET['sSearch'] )."%' OR ";
}
    }
    $sWhere = substr_replace( $sWhere, "", -3 );
    $sWhere .= ')';
}

/* Individual column filtering */
for ( $i=0 ; $i<count($aColumns) ; $i++ )
{
if($i!=4){ //skip tests column
    if ( $_GET['bSearchable_'.$i] == "true" && $_GET['sSearch_'.$i] != '' )
    {
        if ( $sWhere == "" )
        {
            $sWhere = "WHERE ";
        }
        else
        {
            $sWhere .= " AND ";
        }
        $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string($_GET['sSearch_'.$i])."%' ";
    }
}
}


/*
 * SQL queries
 * Get data to display
 */
$sQuery = "
    SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."
    FROM   $sTable
    $sWhere
    $sGroupBy 
    $sOrder
    $sLimit
";

//echo $sQuery;
//die();

$rResult = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error());

/* Data set length after filtering */
$sQuery = "
    SELECT FOUND_ROWS()
";
$rResultFilterTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error());
$aResultFilterTotal = mysql_fetch_array($rResultFilterTotal);
$iFilteredTotal = $aResultFilterTotal[0];

/* Total data set length */
$sQuery = "
    SELECT COUNT(".$sIndexColumn.")
    FROM   $sTable
";
$rResultTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error());
$aResultTotal = mysql_fetch_array($rResultTotal);
$iTotal = $aResultTotal[0];

//added to hide filtering   
//$iTotal = $iFilteredTotal;

/*
 * Output
 */
$output = array(
    "sEcho" => intval($_GET['sEcho']),
    "iTotalRecords" => $iTotal,
    "iTotalDisplayRecords" => $iFilteredTotal,
    "aaData" => array()
);

while ( $aRow = mysql_fetch_array( $rResult ) )
{
    $row = array();
    for ( $i=0 ; $i<count($aColumns) ; $i++ )
    {
        if ( $aColumns[$i] != ' ' )
        {
            /* General output */
            $row[] = $aRow[$i];
        }
    }
    $output['aaData'][] = $row;
}

echo json_encode( $output );

Here is the query that the server side script is generating:

SELECT SQL_CALC_FOUND_ROWS 
    CONCAT( "<input type='hidden' id='order_id' value='", o.id, "'><a href='order_details.php?id=", o.id, "'><img src='search.png' border='0'></a> &nbsp;", o.id ),
    FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y'),
    CONCAT( "<input type='hidden' id='doctor_name' value='", d.first_name, ' ', d.last_name, "'><input type='hidden' id='doctor_id' value='", d.id, "'><a href='doctor_details.php?id=", d.id, "'><img src='search.png' border='0'></a> &nbsp;", d.first_name, ' ', d.last_name ),
    CONCAT( "<input type='hidden' id='patient_name' value='", p.first_name, ' ', p.last_name, "'><input type='hidden' id='patient_id' value='", p.id, "'><input type='hidden' id='patient_ssn' value='", p.ssn, "'><a href='patient_details.php?id=", p.id, "'><img src='search.png' border='0'></a> &nbsp;", p.first_name, ' ', p.last_name ), GROUP_CONCAT(t.name SEPARATOR ', '),
    o.status
FROM `orders` o, `patients` p, `doctors` d, `tests_ordered` tst, `tests` t, `users` u, `events` e
WHERE p.id = o.patient_id AND d.id = o.doctor_id AND tst.order_id = o.id AND t.id = tst.test_id AND u.username = o.assigned_username AND e.event_id = o.event_id AND (o.status='Complete' OR o.status='Draft')
GROUP BY o.id

I'm trying to determine what I can do to optimize this without breaking the search and sorting features of datatables. I've created indexes and set primary keys for all tables as best as I know how. Is there a way to use a JOIN maybe?

Here is the output of EXPLAIN statement:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  u   index   NULL    PRIMARY 32  NULL    5   Using index; Using temporary; Using filesort
1   SIMPLE  o   ALL PRIMARY,patient_id,doctor_id,event_id,assigned_use...   NULL    NULL    NULL    6   Using where
1   SIMPLE  d   eq_ref  PRIMARY PRIMARY 4   pasdbadmin.o.doctor_id  1    
1   SIMPLE  e   eq_ref  PRIMARY PRIMARY 4   pasdbadmin.o.event_id   1   Using index
1   SIMPLE  tst ref order_id,test_id    order_id    4   pasdbadmin.o.id 1    
1   SIMPLE  t   eq_ref  PRIMARY PRIMARY 4   pasdbadmin.tst.test_id  1    
1   SIMPLE  p   eq_ref  PRIMARY PRIMARY 4   pasdbadmin.o.patient_id 1

UPDATE:

The problem seems to have been an issue with using including the users and events tables in the query (neither of which were actually used). Here is the revised code that executes much faster:

$q1 = "'";
$q2 = '"';

$order_id = "CONCAT( ".$q2."<input type='hidden' id='order_id' value='".$q2.", o.id, ".$q2."'><a href='order_details.php?id=".$q2.", o.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", o.id )";
$patient_name = "CONCAT( ".$q2."<input type='hidden' id='patient_name' value='".$q2.", p.first_name, ' ', p.last_name, ".$q2."'><input type='hidden' id='patient_id' value='".$q2.", p.id, ".$q2."'><input type='hidden' id='patient_ssn' value='".$q2.", p.ssn, ".$q2."'><a href='patient_details.php?id=".$q2.", p.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", p.first_name, ' ', p.last_name )";
$doc_name = "CONCAT( ".$q2."<input type='hidden' id='doctor_name' value='".$q2.", d.first_name, ' ', d.last_name, ".$q2."'><input type='hidden' id='doctor_id' value='".$q2.", d.id, ".$q2."'><a href='doctor_details.php?id=".$q2.", d.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", d.first_name, ' ', d.last_name )";
$order_date = "FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y')";
$tests = "GROUP_CONCAT(t.name SEPARATOR ', ')";

$aColumns = array($order_id, $order_date, $doc_name, $patient_name, $tests, 'o.status');

/* Indexed column (used for fast and accurate table cardinality) */
$sIndexColumn = "o.id";

/* DB table to use */
$sTable = "`orders` o, `patients` p, `doctors` d, `tests_ordered` tst, `tests` t";

$sWhere = "WHERE p.id = o.patient_id AND d.id = o.doctor_id AND tst.order_id = o.id AND t.id = tst.test_id";
$order_status = isset($_GET['status']) ? $_GET['status'] : 'all';
if($order_status == 'all'){

}else{
    $sWhere .= " AND (o.status='Complete' OR o.status='$order_status')";
}

$sJoin = "";

$sGroupBy =  "GROUP BY o.id";

/* Database connection information */
$gaSql['user']       = DB_USER;
$gaSql['password']   = DB_PASSWORD;
$gaSql['db']         = DB_NAME;
$gaSql['server']     = DB_SERVER;

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * If you just want to use the basic configuration for DataTables with PHP server-side, there is
 * no need to edit below this line
 */

/* 
 * MySQL connection
 */
$gaSql['link'] =  mysql_pconnect( $gaSql['server'], $gaSql['user'], $gaSql['password']  ) or
    die( 'Could not open connection to server' );

mysql_select_db( $gaSql['db'], $gaSql['link'] ) or 
    die( 'Could not select database '. $gaSql['db'] );

/* 
 * Paging
 */
$sLimit = "";
if ( isset( $_GET['iDisplayStart'] ) && $_GET['iDisplayLength'] != '-1' )
{
    $sLimit = "LIMIT ".mysql_real_escape_string( $_GET['iDisplayStart'] ).", ".
        mysql_real_escape_string( $_GET['iDisplayLength'] );
}

/*
 * Ordering
 */
if ( isset( $_GET['iSortCol_0'] ) )
{
    $sOrder = "ORDER BY  ";
    for ( $i=0 ; $i<intval( $_GET['iSortingCols'] ) ; $i++ )
    {
        if ( $_GET[ 'bSortable_'.intval($_GET['iSortCol_'.$i]) ] == "true" )
        {
            $sOrder .= $aColumns[ intval( $_GET['iSortCol_'.$i] ) ]."
                ".mysql_real_escape_string( $_GET['sSortDir_'.$i] ) .", ";
        }
    }

    $sOrder = substr_replace( $sOrder, "", -2 );
    if ( $sOrder == "ORDER BY" )
    {
        $sOrder = "";
    }
}

/* 
 * Filtering
 * NOTE this does not match the built-in DataTables filtering which does it
 * word by word on any field. It's possible to do here, but concerned about efficiency
 * on very large tables, and MySQL's regex functionality is very limited
 */

if ( $_GET['sSearch'] != "" )
{
    $sWhere .= " AND (";
    for ( $i=0 ; $i<count($aColumns) ; $i++ )
    {
        if($i!=4){ //skip tests column
            $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string( $_GET['sSearch'] )."%' OR ";
        }
    }
    $sWhere = substr_replace( $sWhere, "", -3 );
    $sWhere .= ')';
}

/* Individual column filtering */
for ( $i=0 ; $i<count($aColumns) ; $i++ )
{
    if($i!=4){ //skip tests column
        if ( $_GET['bSearchable_'.$i] == "true" && $_GET['sSearch_'.$i] != '' )
        {
            if ( $sWhere == "" )
            {
                $sWhere = "WHERE ";
            }
            else
            {
                $sWhere .= " AND ";
            }
            $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string($_GET['sSearch_'.$i])."%' ";
        }
    }
}


/*
 * SQL queries
 * Get data to display
 */
$sQuery = "
    SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."
    FROM   $sTable
    $sWhere
    $sJoin
    $sGroupBy 
    $sOrder
    $sLimit
";

$filename = __DIR__.DIRECTORY_SEPARATOR."sql_log.txt";
file_put_contents($filename, $sQuery, FILE_APPEND);

$rResult = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error());

/* Data set length after filtering */
$sQuery = "
    SELECT FOUND_ROWS()
";
$rResultFilterTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error());
$aResultFilterTotal = mysql_fetch_array($rResultFilterTotal);
$iFilteredTotal = $aResultFilterTotal[0];

/* Total data set length */
$sQuery = "
    SELECT COUNT(".$sIndexColumn.")
    FROM   $sTable
";
$rResultTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error());
$aResultTotal = mysql_fetch_array($rResultTotal);
$iTotal = $aResultTotal[0];

//added to hide filtering   
//$iTotal = $iFilteredTotal;

/*
 * Output
 */
$output = array(
    "sEcho" => intval($_GET['sEcho']),
    "iTotalRecords" => $iTotal,
    "iTotalDisplayRecords" => $iFilteredTotal,
    "aaData" => array()
);

while ( $aRow = mysql_fetch_array( $rResult ) )
{
    $row = array();
    for ( $i=0 ; $i<count($aColumns) ; $i++ )
    {
        if ( $aColumns[$i] != ' ' )
        {
            /* General output */
            $row[] = $aRow[$i];
        }
    }
    $output['aaData'][] = $row;
}

echo json_encode( $output );

If I were to use JOINS, the updated query would be:

 SELECT SQL_CALC_FOUND_ROWS 
     CONCAT( "<input type='hidden' id='order_id' value='", o.id, "'><a href='order_details.php?id=", o.id, "'><img src='search.png' border='0'></a> &nbsp;", o.id ),
     FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y'),
     CONCAT( "<input type='hidden' id='doctor_name' value='", d.first_name, ' ', d.last_name, "'><input type='hidden' id='doctor_id' value='", d.id, "'><a href='doctor_details.php?id=", d.id, "'><img src='search.png' border='0'></a> &nbsp;", d.first_name, ' ', d.last_name ),
     CONCAT( "<input type='hidden' id='patient_name' value='", p.first_name, ' ', p.last_name, "'><input type='hidden' id='patient_id' value='", p.id, "'><input type='hidden' id='patient_ssn' value='", p.ssn, "'><a href='patient_details.php?id=", p.id, "'><img src='search.png' border='0'></a> &nbsp;", p.first_name, ' ', p.last_name ),
     GROUP_CONCAT(t.name SEPARATOR ', '),
     o.status
 FROM `orders` o
 JOIN `doctors` d ON d.id = o.doctor_id
 JOIN `patients` p ON p.id = o.patient_id
 JOIN `tests_ordered` tst ON tst.order_id = o.id
 JOIN `tests` t ON t.id = tst.test_id 
 WHERE o.status='Complete' OR o.status='Draft'
 GROUP BY o.id

The problem with this is that DataTables just isn't designed to function correctly with JOINS when sorting and filtering is used due to the columns array, etc. I would love to see a solution that would work with a query like this though.

like image 688
compcentral Avatar asked Feb 26 '13 04:02

compcentral


1 Answers

For a start, if you want to optimize a SQL Statement, first get rid of that HTML crap in there. If nothing else, it obfuscates the actual structure of the Statement. If you must, you can put it back at the end of your optimization, although I'd seriously vote against it: You've got PHP to do the markup. For the sake of clarity I've made it a habit to use JOIN clauses, I rephrased the whole Thing accordingly.

What this process gave me was this:

SELECT SQL_CALC_FOUND_ROWS,
    o.id, o.created_timestamp, o.status,
    d.id, d.first_name, d.last_name, 
    p.id, p.first_name, p.last_name, p.ssn 
    GROUP_CONCAT(t.name SEPARATOR ', '),
FROM `orders` o
JOIN `doctors` d ON d.id = o.doctor_id
JOIN `patients` p ON p.id = o.patient_id
JOIN `users` u ON u.username = o.assigned_username
JOIN `events` e ON e.event_id = o.event_id
JOIN `tests_ordered` tst ON tst.order_id = o.id
JOIN `tests` t ON t.id = tst.test_id 
WHERE o.status='Complete' OR o.status='Draft'
GROUP BY o.id

There are a few things to be observed here:

1) Your main table is orders. That is also the one you're using the WHERE clause for AND you're GROUPing BY that. With id as primary index and a second index on status, this shouldn't be too bad.

2) You're linking in four other tables through was I assume to be foreign keys. Two of those tables are never actually used: there is (most probably) no need to join on usersand events at all. You should get rid of those, which incidentally also removes the less-than-stellar join on a text column (username). Make sure the remaining tables doctors and patients have their primary keys on their respective id columns.

3) You have a more complex join on two tables tests_ordered and tests. All that does is give you a concatenated string of names, but it does add the complexity of the GROUP BY clause. There are two ways to go from here: try to optimize those joins or remove them form the select entirely.

3) Solution A To optimize those joins, make sure you've got an index on tests_ordered.order_id and on tests_ordered.test_id and a Primary index on tests.id. Your Statement should look like this:

SELECT SQL_CALC_FOUND_ROWS,
    o.id, o.created_timestamp, o.status,
    d.id, d.first_name, d.last_name, 
    p.id, p.first_name, p.last_name, p.ssn 
    GROUP_CONCAT(t.name SEPARATOR ', '),
FROM `orders` o
JOIN `doctors` d ON d.id = o.doctor_id
JOIN `patients` p ON p.id = o.patient_id
JOIN `tests_ordered` tst ON tst.order_id = o.id
JOIN `tests` t ON t.id = tst.test_id 
WHERE o.status='Complete' OR o.status='Draft'
GROUP BY o.id

3) Solution B Remove the entire tests/tests_ordered stuff, and put it in a separate select. Your main select will now look like this:

SELECT SQL_CALC_FOUND_ROWS,
    o.id, o.created_timestamp, o.status,
    d.id, d.first_name, d.last_name, 
    p.id, p.first_name, p.last_name, p.ssn 
FROM `orders` o
JOIN `doctors` d ON d.id = o.doctor_id
JOIN `patients` p ON p.id = o.patient_id
WHERE o.status='Complete' OR o.status='Draft'

But you'll either have to run an additional SELECT per row to get the concatenated t.name or you do just one SELECT for all order ids on the current page. The latter would look like this:

SELECT o.id, GROUP_CONCAT(t.name SEPARATOR ', '),
FROM `orders` o
JOIN `tests_ordered` tst ON tst.order_id = o.id
JOIN `tests` t ON t.id = tst.test_id 
WHERE o.in IN ( <put the 10 ids on your current page here, separated by commas> )
GROUP BY o.id

Solution A should run pretty quick on a decent machine. Solution B should run pretty quick on any machine. Straight JOINS on indexed foreign keys are cheap.

4) None of the above selects should return 642 million rows on a database that has less than 642 million orders in it, which I assume you don't. Since MySQL tells you that it probably had to build a full cartesian product, which also explains the experienced speed. That means one of your straight foreign key joins went haywire. Most likely the one in question is the join on users - a useless join in the first place, but in any case: check the uniqueness of its username column.

like image 103
Hazzit Avatar answered Sep 28 '22 19:09

Hazzit