Is it possible to generate PDF with StreamingHttpResponse as it's possible to do so with CSV for large dataset?

Tags:

I have a large dataset that I have to generate CSV and PDF for. With CSV, I use this guide: https://docs.djangoproject.com/en/3.1/howto/outputting-csv/

import csv

from django.http import StreamingHttpResponse

class Echo:
    """An object that implements just the write method of the file-like
    interface.
    """
    def write(self, value):
        """Write the value by returning it, instead of storing in a buffer."""
        return value

def some_streaming_csv_view(request):
    """A view that streams a large CSV file."""
    # Generate a sequence of rows. The range is based on the maximum number of
    # rows that can be handled by a single sheet in most spreadsheet
    # applications.
    rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
    pseudo_buffer = Echo()
    writer = csv.writer(pseudo_buffer)
    response = StreamingHttpResponse((writer.writerow(row) for row in rows),
                                     content_type="text/csv")
    response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
    return response

It works great. However, I can't find anything that can be done for PDF. Can it? I use render_to_pdf as well as I use a template for PDF.

644

asked Aug 10 '20 14:08

good_evening

2 Answers

Think of CSV as a fruit salad. You can slice bananas in a big pot, add some grapefruits, some pineapple, ... and then split the whole into individual portions that you bring together to the table (this is: you generate your CSV file, and then you send it to the client). But you could also make individual portions directly: Cut some slices of a banana in a small bowl, add some grapefruits, some pineapple, ... bring this small bowl to the table, and repeat the process for other individual portions (this is: you generate your CSV file and send it part by part to the client as you generate it).

Well if CSV is a fruit salad, then PDF is a cake. You have to mix all your ingredients and put it in the oven. This means you can't bring a slice of the cake to the table until you have baked the whole cake. Likewise, you can't start sending your PDF file to the client until it's entirely generated.

So, to answer your question, this (response = StreamingHttpResponse((writer.writerow(row) for row in rows), content_type="text/csv")) can't be done for PDF.

However, once your file is generated, you can stream it to the client using FileResponse as mentioned in other answers.

If your issue is that the generation of the PDF takes too much time (and might trigger a timeout error for instance), here are some things to consider:

Try to optimize the speed of your generation algorithm
Generate the file in the background before the client requests it and store it in your storage system. You might want to use a cronjob or celery to trigger the generation of the PDF without blocking the HTTP request.
Use websockets to send the file to the client as soon as it is ready to be downloaded (see django-channels)

169

answered Oct 04 '22 03:10

Antoine Pinsard

Have you tried FileResponse?

Something like this should work, it is basically what you can find in the Django doc:

import io
from django.http import FileResponse
from reportlab.pdfgen import canvas

def stream_pdf(request):
    buffer = io.BytesIO()
    p = canvas.Canvas(buffer)
    p.drawString(10, 10, "Hello world.")
    p.showPage()
    p.save()
    buffer.seek(io.SEEK_SET)
    return FileResponse(buffer, as_attachment=True, filename='helloworld.pdf')

answered Oct 04 '22 03:10

trinchet

Related questions
                            
                                How to embed Python3 with the standard library
                            
                                How can I filter a Pandas GroupBy object and obtain a GroupBy object back?
                            
                                "OverflowError: Allocated too many blocks":
                            
                                Authenticate in Django without a database
                            
                                Comparing logical values to NaN in pandas/numpy
                            
                                How to nest LabelKFold?
                            
                                Performance issues with pandas and filtering on datetime column
                            
                                Tensorflow: How to pass output from previous time-step as input to next timestep
                            
                                pyLDAvis visualization of pyspark generated LDA model
                            
                                OpenALPR not work with PyQt
                            
                                Python: docstrings and type annotations
                            
                                QM coding implementation in Python - is 16 bit word obligatory?
                            
                                pandas rolling apply doesn't do anything
                            
                                Include output from %matplotlib notebook backend as SVG in ipynb
                            
                                How to rotate the 3D scatter plots in google colaboratory?
                            
                                python how to use tika with existing jar file without downloading again
                            
                                How can I get Chrome Browser Version running now with Python? [closed]
                            
                                Weird file seeking behaviour
                            
                                How to select specific columns from tensorflow dataset?
                            
                                Why does assigning with [:] versus iloc[:] yield different results in pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to generate PDF with StreamingHttpResponse as it's possible to do so with CSV for large dataset?

Tags:

python

django

large-data

streaminghttpresponse

good_evening

People also ask

2 Answers

Antoine Pinsard

trinchet

Recent Activity

Donate For Us