Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL select every xth row based on total number

Tags:

sql

postgresql

I have a table with gps data:

lat, lon, timestamp

My queries select by a timestamp window:

SELECT * FROM gps WHERE timesamp >= '2016-12-12T02:00:00Z' AND timestamp <= '2016-12-12T03:00:00Z'

The resultset becomes way to big for larger timespans and I want to limit the size of the resultset. However, I still want the data to cover the entire timespan, but with reduced resolution. So instead of

LIMIT 1000

I want to select every n'th element based on my specified max resultset size and the actual size (count) of the data.

I want something like this

limit = 1000
totalCount = SELECT COUNT(*) FROM gps WHERE timestamp >= ...
resolution = totalCount / limit
SELECT every resolution'th FROM SELECT * FROM gps WHERE timestamp >= ...
like image 509
Chris Avatar asked Dec 29 '16 11:12

Chris


2 Answers

Work with ROW_NUMBER(), which is a "window function":

SELECT s.* FROM (
    SELECT t.*,
           ROW_NUMBER() OVER(ORDER BY t.timestamp) as rnk,
           COUNT(*) OVER() as total_cnt
    FROM gps t
    WHERE t.timestamp between '2016-12-12T02:00:00Z' AND '2016-12-12T03:00:00Z') s
WHERE MOD(s.rnk,(total_cnt/1000)) = 0
like image 171
sagi Avatar answered Sep 16 '22 12:09

sagi


An alternative is to randomize the data and then use limit:

SELECT *
FROM gp
WHERE timesamp >= '2016-12-12T02:00:00Z' AND timestamp <= '2016-12-12T03:00:00Z'
ORDER BY random()
LIMIT x;

This doesn't guarantee an exact distribution across all timestamps. But, it does make it possible to get exactly 1000 values.

like image 39
Gordon Linoff Avatar answered Sep 16 '22 12:09

Gordon Linoff