Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to generate PDF with StreamingHttpResponse as it's possible to do so with CSV for large dataset?

I have a large dataset that I have to generate CSV and PDF for. With CSV, I use this guide: https://docs.djangoproject.com/en/3.1/howto/outputting-csv/

import csv

from django.http import StreamingHttpResponse

class Echo:
    """An object that implements just the write method of the file-like
    interface.
    """
    def write(self, value):
        """Write the value by returning it, instead of storing in a buffer."""
        return value

def some_streaming_csv_view(request):
    """A view that streams a large CSV file."""
    # Generate a sequence of rows. The range is based on the maximum number of
    # rows that can be handled by a single sheet in most spreadsheet
    # applications.
    rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
    pseudo_buffer = Echo()
    writer = csv.writer(pseudo_buffer)
    response = StreamingHttpResponse((writer.writerow(row) for row in rows),
                                     content_type="text/csv")
    response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
    return response

It works great. However, I can't find anything that can be done for PDF. Can it? I use render_to_pdf as well as I use a template for PDF.

like image 644
good_evening Avatar asked Aug 10 '20 14:08

good_evening


People also ask

How do I convert a CSV file to a PDF file in Python?

You can observe this in the following example. import pandas as pd import pdfkit df1 = pd. read_csv('student_details. csv') print("The dataframe is:") print(df1) html_string = df1.


2 Answers

Think of CSV as a fruit salad. You can slice bananas in a big pot, add some grapefruits, some pineapple, ... and then split the whole into individual portions that you bring together to the table (this is: you generate your CSV file, and then you send it to the client). But you could also make individual portions directly: Cut some slices of a banana in a small bowl, add some grapefruits, some pineapple, ... bring this small bowl to the table, and repeat the process for other individual portions (this is: you generate your CSV file and send it part by part to the client as you generate it).

Well if CSV is a fruit salad, then PDF is a cake. You have to mix all your ingredients and put it in the oven. This means you can't bring a slice of the cake to the table until you have baked the whole cake. Likewise, you can't start sending your PDF file to the client until it's entirely generated.

So, to answer your question, this (response = StreamingHttpResponse((writer.writerow(row) for row in rows), content_type="text/csv")) can't be done for PDF.

However, once your file is generated, you can stream it to the client using FileResponse as mentioned in other answers.

If your issue is that the generation of the PDF takes too much time (and might trigger a timeout error for instance), here are some things to consider:

  1. Try to optimize the speed of your generation algorithm
  2. Generate the file in the background before the client requests it and store it in your storage system. You might want to use a cronjob or celery to trigger the generation of the PDF without blocking the HTTP request.
  3. Use websockets to send the file to the client as soon as it is ready to be downloaded (see django-channels)
like image 169
Antoine Pinsard Avatar answered Oct 04 '22 03:10

Antoine Pinsard


Have you tried FileResponse?

Something like this should work, it is basically what you can find in the Django doc:

import io
from django.http import FileResponse
from reportlab.pdfgen import canvas

def stream_pdf(request):
    buffer = io.BytesIO()
    p = canvas.Canvas(buffer)
    p.drawString(10, 10, "Hello world.")
    p.showPage()
    p.save()
    buffer.seek(io.SEEK_SET)
    return FileResponse(buffer, as_attachment=True, filename='helloworld.pdf')
like image 40
trinchet Avatar answered Oct 04 '22 03:10

trinchet