I have a mysql table where each record can have unlimited custom fields (EAV model, doesn't matter this) and each field can have unlimited options and each option can have unlimited values.
Right now i am trying to built a export tool that will export all these custom fields with their values, that is: name => value pairs for each field. That's not the important part, it's here just to highlight that we're talking about a lot of mysql queries for a single record and that the size of the export will be pretty large.
For each row from my main table i must do around 100 separate sql queries to get the fields, fields options and field options values. These queries are pretty fast because they all use the right indexes, but still we're talking about 100 queries for a single record and i expect to have around 50k records in my main table just to start with.
Right now, what i do is:
set_time_limit(0);
ini_set('memory_limit', '1G');
ini_set("auto_detect_line_endings", true);
$count = $export->count();
$date = date('Y-m-d-H-i-s');
$fileName = CHtml::encode($export->name) .'-'. $date . '.csv';
$processAtOnce = 100;
$rounds = round($count / $processAtOnce);
header("Content-disposition: attachment; filename={$fileName}");
header("Content-Type: text/csv");
$headerSet = false;
for ($i = 0; $i < $rounds; ++$i) {
$limit = $processAtOnce;
$offset = $i * $processAtOnce;
$rows = $export->find($limit, $offset);
if (empty($rows)) {
continue;
}
$outStream = fopen('php://output', 'w');
if (!$headerSet) {
fputcsv($outStream, array_keys($rows[0]), ',', '"');
$headerSet = true;
}
foreach ($rows as $row) {
fputcsv($outStream, array_values($row), ',', '"');
}
echo fgets($outStream);
fclose($outStream);
}
Basically i count all the records and i "paginate" them for export, then run through the pages to avoin loading too many sql results at once.
I am wondering if this is a valid approach? Any thoughts?
My alternative would be to count all the records, split them into "pages" and for each page do an ajax request(recursive function called after the previous request has been made successfully). When doing the ajax request, process maybe 1k records at once(these 1k would also be splitted like in the above example, run internally 10 times with 100 results for example), write them into a temporary directory(like part-1.csv, part-2.csv) and at the end when all the records are processed, create an archive from the folder containing all the csv parts and force the browser to download it then remove it from the server(window.location.href from within the last ajax call).
Is this a good alternative to the above?
Please note, my goal is to limit the amount of memory usage that's why i think the second approach would help me more.
Please let me know what you think.
Thanks.
By far the easiest way of exporting your data to a CSV file is to use the web UI, also known as the console, which you can find here. This is a great option if you just need an easy way to export your data one time. To start, make sure you’re in the correct project, which you can confirm at the top of the screen, just left of the search bar.
Without further ado, here are three ways to export your Google BigQuery data to a CSV file for use in your destination apps. By far the easiest way of exporting your data to a CSV file is to use the web UI, also known as the console, which you can find here. This is a great option if you just need an easy way to export your data one time.
Method #1 for exporting CSV files from Databricks: Databricks Notebook Databricks Notebookis Databricks's version of an IPython Notebook and comes with the same functionalities, such as manipulating and exporting data. Once you're done manipulating your data and want to download it, you can go about it in two different ways:
First: You can export 1 GB of table data to a single file in GCS, at most. If your data exceeds that limit, use a wildcard to export the data to multiple files, like so: gs://your_bucket/your_file_name_*.csv. This will create files called gs://your_bucket/your_file_name_000000000000.csv, gs://your_bucket/your_file_name_000000000001.csv and so on.
My final approach is the second one, after a lot of tests i concluded that in my case the second approach is way better in terms of memory usage, even if the time to complete the entire export is longer, that doesn't matter since the GUI will update with live stats about the export and overall is a good user experience while waiting for the export to finish.
These are the steps i took:
1) Load the page and make first ajax request to server.
2) Server will read first 1000 records in batches of 100 records at a time to avoid getting to many results back at once from mysql.
3) The results are written to a file as part-x.csv, where x is the request number sent by ajax.
4) When there are no more records to add to the file, the last ajax call will create the archive, and delete the folder containing the part-x.csv files. The server then will return a json param called "download" which will contain the url to download the file via PHP(fopen + fread + flush + fclose, followed by unlink the archive file)
5) Using the "download" param, the browser will do a window.location.href = json.download
and force the file to be downloaded.
I know, it's more work like this, but as i said, the end result seems to be better than just loading all at once in the way i did first time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With