Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to import huge CSV file with 200,00 rows to MySQL (asynchronous and fast)?

I have to write a PHP script that will import data from a given CSV file into MySQL database. The given CSV file can contain up to 200,000 rows. I tried the following but problems arise :

  1. LOAD DATA LOCAL INFILE : I cannot use LOAD DATA LOCAL INFILE statement because I wanted to do some validations first BEFORE uploading the rows, also, our DB admin doesn't want me to use that statement and I don't know why.
  2. FOR LOOP : Inserting line by line inside FOR loop will take too much time resulting to Connection Timeout.

Now, I am thinking of a solution by splitting the CSV file into smaller chunks, then inserting them asynchronously. I am already done with the splitting of CSV, but I currently have no idea how to asynchronously insert into my database for quick and safe way. But I heard that I will be using Ajax here.

Any solution you can recommend? Thanks a lot in advance!

like image 976
Mead Umandal Avatar asked Sep 10 '15 14:09

Mead Umandal


2 Answers

Thanks to everyone who gave answers to this question. I have discovered a solution! Just wanted to share it, in case someone needs to create a PHP script that will import a huge CSV file into MySQL database (asynchronously and fast!) I have tested my code with 400,000 rows and the importing is done in seconds. I believe it would work with larger files, you just have to modify maximum upload file size.

In this example, I will be importing a CSV file that contains two columns (name, contact_number) into a MySQL DB that contains the same columns.

Your CSV file should look like this :

Ana, 0906123489

John, 0908989199

Peter, 0908298392

...

...

So, here's the solution.

First, create your table

CREATE TABLE `testdb`.`table_test`
( `id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(100) NOT NULL ,
`contact_number` VARCHAR(100) NOT NULL ,
PRIMARY KEY (`id`)) ENGINE = InnoDB;

Second, I have 4 PHP files. All you have to do is place this into a single folder. PHP files are as follows :

index.php

<form action="upload.php" method="post" enctype="multipart/form-data">
<input type="file" name="csv" value="" />
<input type="submit" name="submit" value="Save" /></form>

connect.php

<?php
//modify your connections here
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "testDB";
$conn = new mysqli($servername, $username, $password, $dbname);
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 
?>

senddata.php

<?php
include('connect.php');
$data = $_POST['file'];
$handle = fopen($data, "r");
$test = file_get_contents($data);
if ($handle) {
    $counter = 0;
    //instead of executing query one by one,
    //let us prepare 1 SQL query that will insert all values from the batch
    $sql ="INSERT INTO table_test(name,contact_number) VALUES ";
    while (($line = fgets($handle)) !== false) {
      $sql .= "($line),";
      $counter++;
    }
    $sql = substr($sql, 0, strlen($sql) - 1);
     if ($conn->query($sql) === TRUE) {
    } else {
     }
    fclose($handle);
} else {  
} 
//unlink CSV file once already imported to DB to clear directory
unlink($data);
?>

upload.php

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.1/jquery.js"></script>
<script>
//Declaration of function that will insert data into database
 function senddata(filename){
        var file = filename;
        $.ajax({
            type: "POST",
            url: "senddata.php",
            data: {file},
            async: true,
            success: function(html){
                $("#result").html(html);
            }
        })
        }
 </script>
<?php
$csv = array();
$batchsize = 1000; //split huge CSV file by 1,000, you can modify this based on your needs
if($_FILES['csv']['error'] == 0){
    $name = $_FILES['csv']['name'];
    $ext = strtolower(end(explode('.', $_FILES['csv']['name'])));
    $tmpName = $_FILES['csv']['tmp_name'];
    if($ext === 'csv'){ //check if uploaded file is of CSV format
        if(($handle = fopen($tmpName, 'r')) !== FALSE) {
            set_time_limit(0);
            $row = 0;
            while(($data = fgetcsv($handle)) !== FALSE) {
                $col_count = count($data);
                //splitting of CSV file :
                if ($row % $batchsize == 0):
                    $file = fopen("minpoints$row.csv","w");
                endif;
                $csv[$row]['col1'] = $data[0];
                $csv[$row]['col2'] = $data[1];
                $min = $data[0];
                $points = $data[1];
                $json = "'$min', '$points'";
                fwrite($file,$json.PHP_EOL);
                //sending the splitted CSV files, batch by batch...
                if ($row % $batchsize == 0):
                    echo "<script> senddata('minpoints$row.csv'); </script>";
                endif;
                $row++; 
            }
            fclose($file);
            fclose($handle);
        }
    }
    else
    {
        echo "Only CSV files are allowed.";
    }
    //alert once done.
    echo "<script> alert('CSV imported!') </script>";
}
?>

That's it! You already have a pure PHP script that can import multiple number of rows in seconds! :) (Thanks to my partner who taught and gave me an idea on how to use ajax)

like image 166
Mead Umandal Avatar answered Nov 15 '22 22:11

Mead Umandal


The main slowness comes from sending every single line as it's own request. I would suggest to send the query with every 1000 or 500 rows in the same format used by mysqldump --opt, so build a long string in the way

 insert into datatable (name, prename, commen) 
   values ('wurst', 'hans', 'someone')
   , ('bush', 'george', 'otherone')
   , ...
   ;

You should check how long your lines are allowed to be or if the MySQL- Server is in your control you could extend the maximal query length.

If this is still too long (I mean 200K is not much at all), then you could try to improve the csv-reading.

It is a bit work splitting into those chunks, but you could write a small chunk-class for this, so adding the rows gets a bit easier.

The usage of this class looked like

$chunk->prepare("insert into datatable (name, prename, comment) values");
$chunk->setSize(1000);

foreach ($row...){
   if($query = $chunk->addRow(...)){
       callUpdate($query);
   }
}
if($query = $chunk->clear()){
  callUpdate($query);
}
like image 20
flaschenpost Avatar answered Nov 15 '22 21:11

flaschenpost