Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best architecture for a 30 + hour query

I have an interesting problem to solve. One of my clients has me developing a stock analysis program with close to 50 years of stock data for almost a thousand symbols. I've developed a series of filters that are applied on any given day to see if anything falls out for a trade.

We want to run this filter for each day of data we have for each stock. Basically your begin and end date type report. However it takes 6 minutes to filter each week for each symbol. We are figuring about 40 hours or so to run the report on our entire data set.

The overriding requirement is that my client is able to do anything in the application from any computer anywhere (he travels a lot), so we are browser based.

To solve this issue, I wrote an asychronous method that runs this report, however the application pool inactivity timer will kill the job. I don't want to have to start adjusting timeouts for the entire application to support this one report (we are going to do a lot of these as every stock scenerio will need to be run against our entire dataset for analysis before it gets used for active trading).

Does anyone have any general ideas or experiences with a web architecture that will support ultra-long asychronous processes?

Thanks

like image 473
Mike Malter Avatar asked Jul 07 '10 21:07

Mike Malter


People also ask

How many queries MySQL can handle?

MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing.

How long do you think it would take a team to query a dataset like this manually?

The query takes 20 to 500 ms (or sometimes more) depending on the system and the amount of data. The performance of the database or the database server has a significant influence on the speed.


3 Answers

As a general suggestion I would recommend a standalone Windows Service, Console App or similar with very careful lifetime controls and logging, which would run constantly and check (poll) for 'jobs to process' in a database, then update the database with results and progress information.

It may not be the best way but I've used it before many times and it's reliable, scalable and has good performance.

Best to keep web requests to a minute or two maximum - they were never designed for heavy processing times. This way you can 'check in' on the job status every minute or so (using a Web Service).

If you have any questions of me or about the idea please post a comment & I will be happy to help, elaborate or suggest..

Hope that helps!


(Additional: I believe Windows Services are underused! All it takes is a quick base class or collection of reusable helper methods and you've got a logged, reliable, automatic, configurable, quick-to-implement process running under your control. Quick to prototype with too!)

like image 171
Kieren Johnstone Avatar answered Oct 06 '22 17:10

Kieren Johnstone


Is there any reason not to simply run a service in the background and archive individual resultsets to a read only results table as they are requested? Do you need to run the query in realtime? The app could retrieve pages of results as they get generated by the service.

like image 27
Mike Burton Avatar answered Oct 06 '22 18:10

Mike Burton


It sounds like you are doing SQL queries directly against these data. Have you considered loading the data to e.g. SQL Server Analysis Services and setting up a cube with (for starters) time, stock and symbol dimensions? Depending on the nature of your queries, you may get into quite reasonable response times. Relational databases are good for online transaction processing (within certain load and response time parameters), but analytical work sometimes requires the methods and technologies of data warehouses instead. (Or, perhaps, associative databases... there are alternatives.)

However, considering Murphy, you'll probably have some long running queries. Do the data vary for different end users? If not, why not precompute answers? Nothing http based should take more than a minute to process, if at that -- at least not by design!

like image 32
Pontus Gagge Avatar answered Oct 06 '22 18:10

Pontus Gagge