I have a csv file that can range from 50k to over 100k rows of data.
I'm currently using Laravel w/ Laravel Forge, MySQL, and Maatwebsite Laravel Excel package.
This is to be used by an end-user and not myself so I have created a simple form on my blade view as such:
{!! Form::open(
array(
'route' => 'import.store',
'class' => 'form',
'id' => 'upload',
'novalidate' => 'novalidate',
'files' => true)) !!}
<div class="form-group">
<h3>CSV Product Import</h3>
{!! Form::file('upload_file', null, array('class' => 'file')) !!}
</div>
<div class="form-group">
{!! Form::submit('Upload Products', array('class' => 'btn btn-success')) !!}
</div>
{!! Form::close() !!}
This then stores the file on the server successfully and I'm now able to iterate through the results using something such as a foreach loop.
Now here are the issues I'm facing in chronological order and fixes/attempts: (10k rows test csv file)
As you can see the results are marginal and nowhere near 50k, I can barely even make it near 10k.
I've read up and looked into possible suggestions such as:
Going with load data local infile may not work because my header columns could change per file that's why I have logic to process/iterate through them.
Splitting files before importing is fine under 10k but for 50k or more? That would be highly impractical.
Store on server and then have the server split it and run them individually without troubling the end-user? Possibly but not even sure how to achieve this in PHP yet just only briefly read about that.
Also to note, my queue worker is set to timeout in 10000 seconds which is also very impractical and bad-practice but seems that was the only way it will keep running before memory takes a hit.
Now I can give-in and just upgrade the memory to 1gb but I feel at best it may jump me to 20k rows before it fails again. Something needs to process all these rows quickly and efficiently.
Lastly here is a glimpse of my table structure:
Inventory
+----+------------+-------------+-------+---------+
| id | profile_id | category_id | sku | title |
+----+------------+-------------+-------+---------+
| 1 | 50 | 51234 | mysku | mytitle |
+----+------------+-------------+-------+---------+
Profile
+----+---------------+
| id | name |
+----+---------------+
| 50 | myprofilename |
+----+---------------+
Category
+----+------------+--------+
| id | categoryId | name |
+----+------------+--------+
| 1 | 51234 | brakes |
+----+------------+--------+
Specifics
+----+---------------------+------------+-------+
| id | specificsCategoryId | categoryId | name |
+----+---------------------+------------+-------+
| 1 | 20 | 57357 | make |
| 2 | 20 | 57357 | model |
| 3 | 20 | 57357 | year |
+----+---------------------+------------+-------+
SpecificsValues
+----+-------------+-------+--------+
| id | inventoryId | name | value |
+----+-------------+-------+--------+
| 1 | 1 | make | honda |
| 2 | 1 | model | accord |
| 3 | 1 | year | 1998 |
+----+-------------+-------+--------+
Full CSV Sample
+----+------------+-------------+-------+---------+-------+--------+------+
| id | profile_id | category_id | sku | title | make | model | year |
+----+------------+-------------+-------+---------+-------+--------+------+
| 1 | 50 | 51234 | mysku | mytitle | honda | accord | 1998 |
+----+------------+-------------+-------+---------+-------+--------+------+
So a quick run-through of my logic workflow as simple as possible would be:
I hope someone can share with me some insight on some possible ideas on how I should tackle this while keeping in mind of using Laravel and also that it's not a simple upload I need to process and put into different related tables per line else I'd load data infile it all at once.
Thanks!
You seem to have already figured out the logic for interpreting the CSV lines and converting them to insert queries on the database, so I will focus on the memory exhaustion issue.
When working with large files in PHP, any approach that loads the entire file to memory will either fail, became unbearably slow or require a lot more RAM than you Droplet has.
So my advices are:
Read the file line by line using fgetcsv
$handle = fopen('file.csv', 'r');
if ($handle) {
while ($line = fgetcsv($handle)) {
// Process this line and save to database
}
}
This way only one row at a time will be loaded to memory. Then, you can process it, save to the database, and overwrite it with the next one.
Keep a separate file handle for logging
Your server is short on memory, so logging errors to an array may not be a good idea as all errors will be kept in it. That can become a problem if your csv has lots of entries with empty skus and category ids.
Laravel comes out of the box with Monolog and you can try to adapt it to your needs. However, if it also ends up using too much resources or not fitting your needs, a simpler approach may be the solution.
$log = fopen('log.txt', 'w');
if (some_condition) {
fwrite($log, $text . PHP_EOL);
}
Then, at the end of the script you can store the log file wherever you want.
Disable Laravel's query log
Laravel keeps all your queries stored in memory, and that's likely to be a problem for your application. Luckily, you can use the disableQueryLog method to free some precious RAM.
DB::connection()->disableQueryLog();
Use raw queries if needed
I think it's unlikely that you will run out of memory again if you follow these tips, but you can always sacrifice some of Laravel's convenience to extract that last drop of performance.
If you know your way around SQL, you can execute raw queries to the database.
Edit:
As for the timeout issue, you should be running this code as a queued task as suggested in the comments regardless. Inserting that many rows WILL take some time (specially if you have lots of indexes) and the user shouldn't be staring at an unresponsive page for that long.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With