Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download large CSV file to browser while it is being generated

Tags:

php

csv

download

I have a script that generates a large CSV file using fputcsv and sends it to the browser. It works, but the browser doesn't show the file download prompt (or start downloading the file) until the whole CSV file has been generated serverside, which takes a long time.

Instead, I'd like the download to begin while the remainder of the file has still being generated. I know this is possible because it's how the 'Export database' option in PHPMyAdmin works - the download starts as soon as you click the 'export' button even if your database is huge.

How can I tweak my existing code, below, to let the download begin immediately?

$csv = 'title.csv';
header( "Content-Type: text/csv;charset=utf-8" );
header( "Content-Disposition: attachment;filename=\"$csv\"" );
header( "Pragma: no-cache" );
header( "Expires: 0" );

$fp = fopen('php://output', 'w');
fputcsv($fp, array_keys($array), ';', '"');

foreach ($array as $fields) 
{
    fputcsv($fp, $fields, ';', '"');
}

fclose($fp);
exit();
like image 483
SpiderQ Avatar asked Dec 26 '22 09:12

SpiderQ


1 Answers

Empirically, it seems that when receiving responses featuring a Content-Disposition: attachment header, different browsers will show the file download dialog at the following moments:

  • Firefox shows the dialog as soon as it receives the headers
  • Internet Explorer shows the dialog once it has received the headers plus 255 bytes of the response body.
  • Chromium shows the dialog once it has received the headers plus 1023 bytes of the response body.

Our objectives, then, are as follows:

  1. Flush the first kilobyte of the response body to the browser as soon as possible, so that Chrome users see the file download dialog at the earliest possible moment.
  2. Thereafter, regularly send more content to the browser.

Standing in the way of these objectives are, potentially, multiple levels of buffering, which you can try to fight in different ways.

PHP's output_buffer

If you have output_buffering set to a value other than Off, PHP will automatically create an output buffer which stores all output your script tries to send to the response body. You can prevent this by ensuring that you have output_buffering set to Off from your php.ini file, or from a webserver config file like apache.conf or nginx.conf. Alternatively, you can turn off the output buffer, if one exists, at the start of your script using ob_end_flush() or ob_end_clean():

if (ob_get_level()) {
    ob_end_clean();
}

Buffering done by your webserver

Once your output gets past the PHP output buffer, it may be buffered by your webserver. You can try to get around this by calling flush() regularly (e.g. every 100 lines), although the PHP manual is hesitant about providing any guarantees, listing some particular cases where this may fail:

flush

...

Flushes the write buffers of PHP and whatever backend PHP is using (CGI, a web server, etc). This attempts to push current output all the way to the browser with a few caveats.

flush() may not be able to override the buffering scheme of your web server ...

Several servers, especially on Win32, will still buffer the output from your script until it terminates before transmitting the results to the browser.

Server modules for Apache like mod_gzip may do buffering of their own that will cause flush() to not result in data being sent immediately to the client.

You can alternatively have PHP call flush() automatically every time you try to echo any output, by calling ob_implicit_flush at the start of your script - though beware that if you have gzip enabled via a mechanism that respects flush() calls, such as Apache's mod_deflate module, this regular flushing will cripple its compression attempts and probably result in your 'compressed' output being larger than if it were uncompressed. Explicitly calling flush() every n lines of output, for some modest but non-tiny n, is thus perhaps a better practice.

Putting it all together, then, you should probably tweak your script to look something like this:

<?php

    if (ob_get_level()) {
        ob_end_clean();
    }

    $csv = 'title.csv';
    header( "Content-Type: text/csv;charset=utf-8" );
    header( "Content-Disposition: attachment;filename=\"$csv\"" );
    header( "Pragma: no-cache" );
    header( "Expires: 0" );

    flush(); // Get the headers out immediately to show the download dialog
             // in Firefox

    $array = get_your_csv_data(); // This needs to be fast, of course

    $fp = fopen('php://output', 'w');
    fputcsv($fp, array_keys($array), ';', '"');

    foreach ($array as $i => $fields) 
    {
        fputcsv($fp, $fields, ';', '"');
        if ($i % 100 == 0) {
            flush(); // Attempt to flush output to the browser every 100 lines.
                     // You may want to tweak this number based upon the size of
                     // your CSV rows.
        }
    }

    fclose($fp);

?>

If this doesn't work, then I don't think there's anything more you can do from your PHP code to try to resolve the problem - you need to figure out what's causing your web server to buffer your output and try to solve that using your server's configuration files.

like image 153
Mark Amery Avatar answered May 17 '23 15:05

Mark Amery