Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using StreamingHttpResponse with Django Rest Framework CSV

I have a standard DRF web application that outputs CSV data for one of the routes. Rendering the entire CSV representation takes a while to do. The data set is quite large so I wanted to have a streaming HTTP response so the client doesn't time out.

However using the example provided in https://github.com/mjumbewu/django-rest-framework-csv/blob/2ff49cff4b81827f3f450fd7d56827c9671c5140/rest_framework_csv/renderers.py#L197 doesn't quite accomplish this. The data is still one large payload instead of being chunked and the client ends up waiting for a response before the bytes are received.

The structure is similar to what follows:

models.py

class Report(models.Model):
  count = models.PostiveIntegerField(blank=True)
  ...

renderers.py

class ReportCSVRenderer(CSVStreamingRenderer):
  header = ['count']

serializers.py

class ReportSerializer(serializers.ModelSerializer):
  count = fields.IntegerField()

  class Meta:
    model = Report

views.py

class ReportCSVView(generics.Viewset, mixins.ListModelMixin):
  def get_queryset(self):
    return Report.objects.all()

  def list(self, request, *args, **kwargs):
    queryset = self.get_queryset()
    data = ReportSerializer(queryset, many=True)
    renderer = ReportCSVRenderer()

    response = StreamingHttpResponse(renderer.render(data), content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename="f.csv"'

    return response

NOTE: had to comment out or change some things.

Thank you

like image 653
3066d0 Avatar asked Oct 11 '17 18:10

3066d0


2 Answers

A simpler solution, inspired by the @3066d0's one:

renderers.py

class ReportsRenderer(CSVStreamingRenderer):
    header = [ ... ]
    labels = { ... }

views.py

class ReportCSVViewset(ListModelMixin, GenericViewSet):
    queryset = Report.objects.select_related('stuff')
    serializer_class = ReportCSVSerializer
    renderer_classes = [ReportsRenderer]
    PAGE_SIZE = 1000

    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())
        response = StreamingHttpResponse(
            request.accepted_renderer.render(self._stream_serialized_data(queryset)),
            status=200,
            content_type="text/csv",
        )
        response["Content-Disposition"] = 'attachment; filename="reports.csv"'
        return response

    def _stream_serialized_data(self, queryset):
        serializer = self.get_serializer_class()
        paginator = Paginator(queryset, self.PAGE_SIZE)
        for page in paginator.page_range:
            yield from serializer(paginator.page(page).object_list, many=True).data

The point is that you need to pass a generator that yields serialized data as the data argument to the renderer, and then the CSVStreamingRenderer does its things and streams the response itself. I prefer this approach, because this way you do not need to override the code of a third-party library.

like image 192
Andrii Vityk Avatar answered Oct 13 '22 00:10

Andrii Vityk


Django's StreamingHttpResponse can be much slower than a traditional HttpResponse for small responses.

Don't use it if you don't need to; the Django Docs actually recommend that StreamingHttpResponse should only be used in when it is absolutely required that the whole content isn't iterated before transferring the data to the client."

Also for your problem you may find useful setting the chunk_size, switching to FileResponse or returning to a normal Response (if using the REST framework) or HttpResponse.

Edit 1: About setting the chunk size:

In the File api you can open the File in chunks so not all the file gets loaded in memory.

I hope you find this useful.

like image 27
trenixjetix Avatar answered Oct 13 '22 00:10

trenixjetix