Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL query very slow because of BLOB field (that can't be moved in another table)

Tags:

mysql

blob

I am developping a PyQT software based on a MySql Database. The database contains some recorded electrical signals, and all the information describing these signals (sampling rate, date of recoding etc...).

To have an idea, one database contains between 10 000 and 100 000 rows, and total size is >10Gb. All these data are stored on a dedicated server. In fact, most of the data is the signal itself, which is in a BLOB field called analogsignal.signal (see below)

here is the architecture of the database : http://packages.python.org/OpenElectrophy/_images/simple_diagram1.png

I can't change it (I can add columns and indexes, but I can not move or delete existing columns).

In the software, I need to list all the analogsignal columns (id, name, channel, t_start,sampling_rate), except analogsignal.signal, which is called later via the analogsignal.id. So I'm doing the following query

SELECT block.id, block.datetime, segment.id, analogsignal.id, analogsignal.name, analogsignal.channel, analogsignal.sampling_rate, block.fileOrigin, block.info

FROM segment, block, analogsignal

WHERE block.id=segment.id_block

AND segment.id=analogsignal.id_segment

ORDER BY analogsignal.id

The problem is, my queries are vey slow (> 10 min if the request is not in cache) because of the presence of analogsignal.signal column. If i understand correctly what's happening, the table is read line by line, including analogsignal.signal, even if the analogsignal.signal is not in the SELECT field.

Does anyone have an idea how to optimize the database or the query without moving the BLOB in an other table (which I agree would be more logical, but I do not control this point).

Thank you!

Here's the CREATE TABLE command for the AnalogSignal table (pulled/formatted from comment)

CREATE TABLE analogsignal 
  ( id int(11) NOT NULL AUTO_INCREMENT, 
    id_segment int(11) DEFAULT NULL,
    id_recordingpoint int(11) DEFAULT NULL, 
    name text, 
    channel int(11) DEFAULT NULL, 
    t_start float DEFAULT NULL, 
    sampling_rate float DEFAULT NULL, 
    signal_shape varchar(128) DEFAULT NULL, 
    signal_dtype varchar(128) DEFAULT NULL, 
    signal_blob longblob, Tag text, 
    PRIMARY KEY (id), 
    KEY ix_analogsignal_id_recordingpoint (id_recordingpoint),
    KEY ix_analogsignal_id_segment (id_segment) 
  ) ENGINE=MyISAM AUTO_INCREMENT=34798 DEFAULT CHARSET=latin1 ;

EDIT: Problem solved, here are the key points:

-I had to add a multiple column index, type INDEX on all he SELECT fields in the analogsignal table
-The columns of 'TEXT' type blocked the use of the index. I converted these TEXT fields in VARCHAR(xx). for this I used this simple command:

SELECT MAX(LENGTH(field_to_query)) FROM table_to_query

to check the minimal text length before conversion, to be sure that I will not loose any data

ALTER TABLE table_to_query CHANGE field_to_query field_to_query VARCHAR(24)

I first used VARCHAR(8000), but with this setting, VARCHAR was like a TEXT field, and indexing didn't worked. No such problem with VARCHAR(24). If I'm right, the total TEXT length (all fields included) in a query must no pass 1000 bytes

Then I indexed all the columns as said above, with no size parameter in the index

Finally, using a better query structure (thank you DRapp), improved also the query. I passed from 215s to 0.016s for the query, with no cache...

like image 859
Vantoine Avatar asked Nov 08 '12 13:11

Vantoine


2 Answers

In addition to trying to shrink your "blob" column requirements by putting the data to an external physical file and just storing the path\file name in the corresponding record, I would try the following as an alternative...

I would reverse the query and put your AnalogSignal table first as it is basis of the order by clause and reverse the query backwards to the blocks. Also, to prevent having to read every literal row of data, if you build a compound index on all columns you want in your output, it would make a larger index, but then the query will pull the values directly from the key expression instead of from reading back to the actual rows of data.

create index KeyDataOnly on AnalogSignal ( id, id_segment, name, channel, sampling_rate )

SELECT STRAIGHT_JOIN
      block.id, 
      block.datetime, 
      segment.id, 
      analogsignal.id, 
      analogsignal.name, 
      analogsignal.channel, 
      analogsignal.sampling_rate, 
      block.fileOrigin, 
      block.info
   FROM 
      analogsignal
         JOIN Segment
            on analogsignal.id_segment = segment.id
            JOIN block
               on segment.id_block = block.id
   ORDER BY 
      analogsignal.id
like image 135
DRapp Avatar answered Sep 26 '22 00:09

DRapp


If you cannot delete the BLOB column, do you have to fill it? You could add a column for storing the path/to/filename of your signal and then put all your signal files in the appropriate directory(s). Once that's done, set your BLOB field values to null.

It's probably breaking the spirit of the restrictions you're under. But arbitrary restrictions often need to be circumvented.

like image 26
dnagirl Avatar answered Sep 24 '22 00:09

dnagirl