Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL multiple rows to columns very slow performance

I'm having serious performance problems with a particular SQL row-to-column translation. A single query takes up to a minute to pull up 6 hours of data. I've simplified the tables below for clarity, but there's well over a million datapoints per day so the table size might be contributing. I don't have much experience with complex queries like this so I'd like some advice. The data tables are as follows...

(DataPoint table)

id     datetime
_____________________
1      2015-09-08 21:00:00
2      2015-09-08 21:00:01
3      2015-09-08 21:00:02
4      2015-09-08 21:00:03
5      2015-09-08 21:00:04

(SensorData table)

id     datapointId     SensorId     Temp    DateTime
_____________________________________________________
1      1               20           34.6    2015-09-08 21:00:00.345
2      1               21           34.2    2015-09-08 21:00:00.551
3      1               22           34.1    2015-09-08 21:00:00.101
4      1               41           34.3    2015-09-08 21:00:00.700
5      2               20           34.2    2015-09-08 21:00:01.223
6      2               21           34.4    2015-09-08 21:00:01.456
7      2               22           34.5    2015-09-08 21:00:01.100
8      2               41           34.6    2015-09-08 21:00:01.870

Note that that individual sensor DateTime and the DataPoint DateTime are not equal (but they are close). This is why I went with a second table to give me the ability to do a rough side-by-side comparison without having to do some fuzzy dateTime logic.

I need to query this information with a resulting dataset that looks like this...

DateTime              Temp1    Temp2    Temp3
__________________________________________________
2015-09-08 21:00:00   34.5       34.1       41.1
2015-09-08 21:00:01   34.4       34.2       41.2
2015-09-08 21:00:02   34.4       34.2       41.2

My current SQL query is written (dynamically) as such...

SELECT DataPoint.DateTime,
max(case when SensorData.SensorId = 20 then SensorData.Temp end) Temp1,
max(case when SensorData.SensorId = 21 then SensorData.Temp end) Temp2,
max(case when SensorData.SensorId = 22 then SensorData.Temp end) Temp3,
FROM DataPoint LEFT JOIN SensorData ON DataPoint.Id=dbo.SensorData.DataPointId
WHERE DataPoint.DateTime BETWEEN [x] and [y]
GROUP BY DataPoint.DateTime ORDER BY DataPoint.DateTime

So my question has three parts:

1) Why is this particular query so slow?

2) Is there a better method for storing this information that I'm missing? I'm still in the design phase at this point. I chose this schema because I need to be able to compare timeseries information for sensors that fire off their data at irregular intervals.

3) Is there a faster way to query and translate this data to my desired format?

EDIT!!! Sorry, there is a GroupBy clause at the end of my query that I forgot to add. My bad.

like image 565
dweeb_plus_plus Avatar asked Oct 30 '22 16:10

dweeb_plus_plus


1 Answers

  • First you need make sure have index for DataPointId, if that is a PK probably have index, but if is a FK you will have to manually add it.

  • Second you need index for SensorId and DateTime

  • Third filter the SensorID before the query right now you are processing all the million record to generate the report

.

WHERE SensorID IN (20,21,22)

Also you can try PIVOT function

SqlFiddleDemo

SELECT [DateTime], [20] as Temp1, [21] as Temp2, [22] as Temp3
FROM
(SELECT [DateTime], SensorId, [Temp] 
 FROM sensor) AS SourceTable
PIVOT
(
MAX([Temp])
    FOR SensorId IN ([20], [21], [22])
) AS PivotTable;
like image 195
Juan Carlos Oropeza Avatar answered Nov 02 '22 22:11

Juan Carlos Oropeza