Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select a Portion of Vast Data Over Time with MySQL

Tags:

sql

mysql

I have hundreds of thousands of price points spanning 40 years plus. I would like to construct a query that will only return 3000 total data points, with the last 500 being the most recent data points, and the other 2500 being just a sample of the rest of the data, evenly distributed.

Is it possible to do this in one query? How would I select just a sample of the large amount of data? This is a small example of what I mean for getting just a sample of the other 2500 data points:

1
2
3    
4
5
6
7
8
9
10

And I want to return something like this:

1
5
10

Here's the query for the last 500:

SELECT * FROM price ORDER BY time_for DESC LIMIT 500

I'm not sure how to go about getting the sample data from the other data points.

like image 391
Kirk Ouimet Avatar asked Sep 06 '12 04:09

Kirk Ouimet


People also ask

How to LOAD data INTO table in MySQL?

mysql> LOAD DATA LOCAL INFILE '/path/pet. txt' INTO TABLE pet; If you created the file on Windows with an editor that uses \r\n as a line terminator, you should use this statement instead: mysql> LOAD DATA LOCAL INFILE '/path/pet.

How does mysql handle millions of records?

Show activity on this post. As already mentioned, fetching 2.5 mio entries requires loads of memory / cpu power. Try fetching the records in batches. If that's not solving your problem, you should consider finding a better way to not loop through such an amount of records each time.

How do I select top 10 rows in mysql?

To select first 10 elements from a database using SQL ORDER BY clause with LIMIT 10. Insert some records in the table using insert command. Display all records from the table using select statement. Here is the alternate query to select first 10 elements.

Which statement can you use to LOAD data from a file to a table in SQL?

The LOAD DATA statement reads rows from a text file into a table at a very high speed. The file can be read from the server host or the client host, depending on whether the LOCAL modifier is given.


Video Answer


1 Answers

Try this:

(SELECT * FROM price ORDER BY time_for DESC LIMIT 500)
UNION ALL
(SELECT * FROM price WHERE time_for < (SELECT time_for FROM price ORDER BY time_for LIMIT 500, 1) ORDER BY rand() LIMIT 2500)
ORDER BY time_for

Note: It's probably going to be slow. How big is this table?

It might be faster to only get the primary ID from all these rows, then join it to the original in a secondary query once it's narrowed down. This is because ORDER BY rand() LIMIT has to sort the entire table. If the table is large this can take a LONG time, and a lot of disk space. Retrieving only the ID reduces the necessary disk space.

like image 73
Ariel Avatar answered Sep 22 '22 17:09

Ariel