I have a rather large .csv file (up to 1 million lines) that I want to generate and send when a browser requests it.
The current code I have is (except that I don't actually generate the same data):
class CSVHandler(tornado.web.RequestHandler):
def get(self):
self.set_header('Content-Type','text/csv')
self.set_header('content-Disposition','attachement; filename=dump.csv')
self.write('lineNumber,measure\r\n') # File header
for line in range(0,1000000):
self.write(','.join([str(line),random.random()])+'\r\n') # mock data
app = tornado.web.Application([(r"/csv",csvHandler)])
app.listen(8080)
The problems I have with the method above are:
By default, all data is buffered in memory until the end of the request so that it can be replaced with an error page if an exception occurs. To send a response incrementally, your handler must be asynchronous (so it can be interleaved with both the writing of the response and other requests on the IOLoop) and use the RequestHandler.flush()
method.
Note that "being asynchronous" is not the same as "using the @tornado.web.asynchronous
decorator"; in this case I recommend using @tornado.gen.coroutine
instead of @asynchronous
. This allows you to simply use the yield
operator with every flush:
class CSVHandler(tornado.web.RequestHandler):
@tornado.gen.coroutine
def get(self):
self.set_header('Content-Type','text/csv')
self.set_header('content-Disposition','attachment; filename=dump.csv')
self.write('lineNumber,measure\r\n') # File header
for line in range(0,1000000):
self.write(','.join([str(line),random.random()])+'\r\n') # mock data
yield self.flush()
self.flush()
starts the process of writing the data to the network, and yield
waits until that data has reached the kernel. This lets other handlers run and also helps manage memory consumption (by limiting how far ahead of the client's download speed you can get). Flushing after every line of a CSV file is a little expensive, so you may want to only flush after every 100 or 1000 lines.
Note that if there is an exception once the download has started, there is no way to show an error page to the client; you can only cut the download off partway through. Try to validate the request and do everything that is likely to fail before the first call to flush().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With