Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for processing a lot of data while the user waits (in Rails)?

I have a bookmarklet that, when used, submits all of the URLs on the current browser page to a Rails 3 app for processing. Behind the scenes I'm using Typhoeus to check that each URL returns a 2XX status code. Currently I initiate this process via an AJAX request to the Rails server and simply wait while it processes and returns the results. For a small set, this is very quick, but when the number of URLs is quite large, the user can be waiting for up to, say, 10-15 seconds.

I've considered using Delayed Job to process this outside the user's thread, but this doesn't seem like quite the right use-case. Since the user needs to wait until the processing is finished to see the results and Delayed Job may take up to five seconds before the job is even started, I can't guarantee that the processing will happen as soon as possible. This wait time isn't acceptable in this case unfortunately.

Ideally, what I think should happen is this:

  • User hits bookmarklet
  • Data is sent to the server for processing
  • A waiting page is instantly returned while spinning off a thread to do the processing
  • The waiting page periodically polls via ajax for the results of the processing and updates the waiting page (ex: "4 of 567 URLs processed...")
  • the waiting page is updated with the results once they are ready

Some extra details:

  • I'm using Heroku (long running processes are killed after 30 seconds)
  • Both logged in and anonymous users can use this feature

Is this a typical way to do this, or is there a better way? Should I just roll my own off-thread processing that updates the DB during processing or is there something like Delayed Job that I can use for this (and that works on Heroku)? Any pushes in the right direction would be much appreciated.

like image 557
markquezada Avatar asked Nov 14 '22 05:11

markquezada


1 Answers

I think your latter idea makes the most sense. I would just offload the processing of each url-check to its own thread (so all the url checks run concurrently -- which should be a lot faster than sequential checks anyway). As each finishes, it updates the database (making sure the threads don't step on each other's writes). An AJAX endpoint -- which, as you said, you poll regularly on the client side -- will grab and return the count of completed processes from the database. This is a simple enough method that I don't really see the need for any extra components.

like image 173
Ben Lee Avatar answered Dec 07 '22 00:12

Ben Lee