Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making a web interface to a script that takes 30 minutes to execute

I wrote a python script to process some data from CSV files. The script takes between 3 to 30 minutes to complete, depending on the size of the CSV.

Now I want to put in a web interface to this, so I can upload the CSV data files from anywhere. I wrote a basic HTTP POST upload page and used Python's CGI module - but the script just times out after some time.

The script outputs HTTP headers at the start, and outputs bits of data after iterating over every line of the CSV. As an example, this print statement would trigger every 30 seconds or so.

# at the very top, with the 'import's
print "Content-type: text/html\n\n Processing ... <br />"

# the really long loop.
for currentRecord in csvRecords:
    count = count + 1
    print "On line " + str(count) + " <br />"

I assumed the browser would receive the headers, and wait since it keeps on receiving little bits of data. But what actually seems to happen is it doesn't receive any data at all, and Error 504 times out when given a CSV with lots of lines.

Perhaps there's some caching happening somewhere? From the logs,

[Wed Jan 20 16:59:09 2010] [error] [client ::1] Script timed out before returning headers: datacruncher.py, referer: http://localhost/index.htm
[Wed Jan 20 17:04:09 2010] [warn] [client ::1] Timeout waiting for output from CGI script /Library/WebServer/CGI-Executables/datacruncher.py, referer: http://localhost/index.htm

What's the best way to resolve this, or, is it not appropriate to run such scripts in a browser?

Edit: This is a script for my own use - I normally intend to use it on my computer, but I thought a web-based interface could come in handy while travelling, or for example from a phone. Also, there's really nothing to download - the script will most probably e-mail a report off at the end.

like image 781
Pranab Avatar asked Jan 20 '10 11:01

Pranab


People also ask

How to run Python script after every 5 minutes?

With the help of the Schedule module, we can make a python script that will be executed in every given particular time interval. with this function schedule. every(5). minutes.do(func) function will call every 5 minutes.


2 Answers

I would separate the work like this:

  1. A web app URL that accept the POSTed CSV file. The web app puts the CSV content into an off line queue, for instance a database table. The web app's response should be an unique ID for the queued item (use an auto-incremented ID column, for instance). The client must store this ID for part 3.

  2. A stand-alone service app that polls the queue for work, and does the processing. Upon completion of the processing, store the results in another database table, using the unique ID as a key.

  3. A web app URL that can get processed results, http://server/getresults/uniqueid/. If the processing is finished (i.e. the unique ID is found in the results database table), then return the results. If not finished, the response should be a code that indicates this. For instance a custom HTTP header, a HTTP status response, response body 'PENDING' or similar.

like image 173
codeape Avatar answered Oct 03 '22 08:10

codeape


I've had this situation before and I used cronjobs. The HTTP script would just write in a queue a job to be performed (a DB or a file in a directory) and the cronjob would read it and execute that job.

like image 27
sharjeel Avatar answered Oct 03 '22 10:10

sharjeel