Question:
How to read and echo file size of uploaded file being written at server in real time without blocking at both server and client?
Context:
Progress of file upload being written to server from POST
request made by fetch()
, where body
is set to Blob
, File
, TypedArray
, or ArrayBuffer
object.
The current implementation sets File
object at body
object passed to second parameter of fetch()
.
Requirement:
Read and echo
to client the file size of file being written to filesystem at server as text/event-stream
. Stop when all of the bytes, provided as a variable to the script as a query string parameter at GET
request have been written. The read of the file currently takes place at a separate script environment, where GET
call to script which should read file is made following POST
to script which writes file to server.
Have not reached error handling of potential issue with write of file to server or read of file to get current file size, though that would be next step once echo
of file size portion is completed.
Presently attempting to meet requirement using php
. Though also interested in c
, bash
, nodejs
, python
; or other languages or approaches which can be used to perform same task.
The client side javascript
portion is not an issue. Simply not that versed in php
, one of the most common server-side languages used at world wide web, to implement the pattern without including parts which are not necessary.
Motivation:
Progress indicators for fetch?
Related:
Fetch with ReadableStream
Issues:
Getting
PHP Notice: Undefined index: HTTP_LAST_EVENT_ID in stream.php on line 7
at terminal
.
Also, if substitute
while(file_exists($_GET["filename"])
&& filesize($_GET["filename"]) < intval($_GET["filesize"]))
for
while(true)
produces error at EventSource
.
Without sleep()
call, correct file size was dispatched to message
event for a 3.3MB
file, 3321824
, was printed at console
61921
, 26214
, and 38093
times, respectively, when uploaded same file three times. The expected result is file size of file as the file is being written at
stream_copy_to_stream($input, $file);
instead of file size of uploaded file object. Are fopen()
or stream_copy_to_stream()
blocking as to other a different php
process at stream.php
?
Tried so far:
php
is attributed to
php
// can we merge `data.php`, `stream.php` to same file?
// can we use `STREAM_NOTIFY_PROGRESS`
// "Indicates current progress of the stream transfer
// in bytes_transferred and possibly bytes_max as well" to read bytes?
// do we need to call `stream_set_blocking` to `false`
// data.php
<?php
$filename = $_SERVER["HTTP_X_FILENAME"];
$input = fopen("php://input", "rb");
$file = fopen($filename, "wb");
stream_copy_to_stream($input, $file);
fclose($input);
fclose($file);
echo "upload of " . $filename . " successful";
?>
// stream.php
<?php
header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
header("Connection: keep-alive");
// `PHP Notice: Undefined index: HTTP_LAST_EVENT_ID in stream.php on line 7` ?
$lastId = $_SERVER["HTTP_LAST_EVENT_ID"] || 0;
if (isset($lastId) && !empty($lastId) && is_numeric($lastId)) {
$lastId = intval($lastId);
$lastId++;
}
// else {
// $lastId = 0;
// }
// while current file size read is less than or equal to
// `$_GET["filesize"]` of `$_GET["filename"]`
// how to loop only when above is `true`
while (true) {
$upload = $_GET["filename"];
// is this the correct function and variable to use
// to get written bytes of `stream_copy_to_stream($input, $file);`?
$data = filesize($upload);
// $data = $_GET["filename"] . " " . $_GET["filesize"];
if ($data) {
sendMessage($lastId, $data);
$lastId++;
}
// else {
// close stream
// }
// not necessary here, though without thousands of `message` events
// will be dispatched
// sleep(1);
}
function sendMessage($id, $data) {
echo "id: $id\n";
echo "data: $data\n\n";
ob_flush();
flush();
}
?>
javascript
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<input type="file">
<progress value="0" max="0" step="1"></progress>
<script>
const [url, stream, header] = ["data.php", "stream.php", "x-filename"];
const [input, progress, handleFile] = [
document.querySelector("input[type=file]")
, document.querySelector("progress")
, (event) => {
const [file] = input.files;
const [{size:filesize, name:filename}, headers, params] = [
file, new Headers(), new URLSearchParams()
];
// set `filename`, `filesize` as search parameters for `stream` URL
Object.entries({filename, filesize})
.forEach(([...props]) => params.append.apply(params, props));
// set header for `POST`
headers.append(header, filename);
// reset `progress.value` set `progress.max` to `filesize`
[progress.value, progress.max] = [0, filesize];
const [request, source] = [
new Request(url, {
method:"POST", headers:headers, body:file
})
// https://stackoverflow.com/a/42330433/
, new EventSource(`${stream}?${params.toString()}`)
];
source.addEventListener("message", (e) => {
// update `progress` here,
// call `.close()` when `e.data === filesize`
// `progress.value = e.data`, should be this simple
console.log(e.data, e.lastEventId);
}, true);
source.addEventListener("open", (e) => {
console.log("fetch upload progress open");
}, true);
source.addEventListener("error", (e) => {
console.error("fetch upload progress error");
}, true);
// sanity check for tests,
// we don't need `source` when `e.data === filesize`;
// we could call `.close()` within `message` event handler
setTimeout(() => source.close(), 30000);
// we don't need `source' to be in `Promise` chain,
// though we could resolve if `e.data === filesize`
// before `response`, then wait for `.text()`; etc.
// TODO: if and where to merge or branch `EventSource`,
// `fetch` to single or two `Promise` chains
const upload = fetch(request);
upload
.then(response => response.text())
.then(res => console.log(res))
.catch(err => console.error(err));
}
];
input.addEventListener("change", handleFile, true);
</script>
</body>
</html>
You need to clearstatcache to get real file size. With few other bits fixed, your stream.php may look like following:
<?php
header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
header("Connection: keep-alive");
// Check if the header's been sent to avoid `PHP Notice: Undefined index: HTTP_LAST_EVENT_ID in stream.php on line `
// php 7+
//$lastId = $_SERVER["HTTP_LAST_EVENT_ID"] ?? 0;
// php < 7
$lastId = isset($_SERVER["HTTP_LAST_EVENT_ID"]) ? intval($_SERVER["HTTP_LAST_EVENT_ID"]) : 0;
$upload = $_GET["filename"];
$data = 0;
// if file already exists, its initial size can be bigger than the new one, so we need to ignore it
$wasLess = $lastId != 0;
while ($data < $_GET["filesize"] || !$wasLess) {
// system calls are expensive and are being cached with assumption that in most cases file stats do not change often
// so we clear cache to get most up to date data
clearstatcache(true, $upload);
$data = filesize($upload);
$wasLess |= $data < $_GET["filesize"];
// don't send stale filesize
if ($wasLess) {
sendMessage($lastId, $data);
$lastId++;
}
// not necessary here, though without thousands of `message` events will be dispatched
//sleep(1);
// millions on poor connection and large files. 1 second might be too much, but 50 messages a second must be okay
usleep(20000);
}
function sendMessage($id, $data)
{
echo "id: $id\n";
echo "data: $data\n\n";
ob_flush();
// no need to flush(). It adds content length of the chunk to the stream
// flush();
}
Few caveats:
Security. I mean luck of it. As I understand it is a proof of concept, and security is the least of concerns, yet the disclaimer should be there. This approach is fundamentally flawed, and should be used only if you don't care of DOS attacks or information about your files goes out.
CPU. Without usleep
the script will consume 100% of a single core. With long sleep you are at risk of uploading the whole file within a single iteration and the exit condition will be never met. If you are testing it locally, the usleep
should be removed completely, since it is matter of milliseconds to upload MBs locally.
Open connections. Both apache and nginx/fpm have finite number of php processes that can serve the requests. A single file upload will takes 2 for the time required to upload the file. With slow bandwidth or forged requests, this time can be quite long, and the web server may start to reject requests.
Clientside part. You need to analyse the response and finally stop listening to the events when the file is fully uploaded.
EDIT:
To make it more or less production friendly, you will need an in-memory storage like redis, or memcache to store file metadata.
Making a post request, add a unique token which identify the file, and the file size.
In your javascript:
const fileId = Math.random().toString(36).substr(2); // or anything more unique
...
const [request, source] = [
new Request(`${url}?fileId=${fileId}&size=${filesize}`, {
method:"POST", headers:headers, body:file
})
, new EventSource(`${stream}?fileId=${fileId}`)
];
....
In data.php register the token and report progress by chunks:
....
$fileId = $_GET['fileId'];
$fileSize = $_GET['size'];
setUnique($fileId, 0, $fileSize);
while ($uploaded = stream_copy_to_stream($input, $file, 1024)) {
updateProgress($id, $uploaded);
}
....
/**
* Check if Id is unique, and store processed as 0, and full_size as $size
* Set reasonable TTL for the key, e.g. 1hr
*
* @param string $id
* @param int $size
* @throws Exception if id is not unique
*/
function setUnique($id, $size) {
// implement with your storage of choice
}
/**
* Updates uploaded size for the given file
*
* @param string $id
* @param int $processed
*/
function updateProgress($id, $processed) {
// implement with your storage of choice
}
So your stream.php don't need to hit the disk at all, and can sleep as long as it is acceptable by UX:
....
list($progress, $size) = getProgress('non_existing_key_to_init_default_values');
$lastId = 0;
while ($progress < $size) {
list($progress, $size) = getProgress($_GET["fileId"]);
sendMessage($lastId, $progress);
$lastId++;
sleep(1);
}
.....
/**
* Get progress of the file upload.
* If id is not there yet, returns [0, PHP_INT_MAX]
*
* @param $id
* @return array $bytesUploaded, $fileSize
*/
function getProgress($id) {
// implement with your storage of choice
}
The problem with 2 open connections cannot be solved unless you give up EventSource for old good pulling. Response time of stream.php without loop is a matter of milliseconds, and it is quite wasteful to keep the connection open all the time, unless you need hundreds updates a second.
You need break file on chunks with javascript and send those chunks. When chunk is uploaded you know exactly how many data were sent.
This is the only way and by the way it is not hard.
file.startByte += 100000;
file.stopByte += 100000;
var reader = new FileReader();
reader.onloadend = function(evt) {
data.blob = btoa(evt.target.result);
/// Do upload here, I do with jQuery ajax
}
var blob = file.slice(file.startByte, file.stopByte);
reader.readAsBinaryString(blob);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With