I have created a script that read data from a CSV file, check if the data already exists in the database and import it if it does not. If the data does exist (the code of a specific product) then the rest of the information needs to be updated from the CSV file.
For example; I have a member with code WTW-2LT, named Alex and surname Johnson in my CSV file. The script checks if the member with code WTW-2LT, named Alex and surname Johnson already exist, if it does, the contact details and extra details needs to be updated from the script (other details like subject and lecturer also needs to be checked, all details are in one line in the CSV), if it doesn't exist the new member just have to be created.
My script what I have so far with minimum other checks to prevent distraction for now;
while ($row = fgetcsv($fp, null, ";")) {
if ($header === null) {
$header = $row;
continue;
}
$record = array_combine($header, $row);
$member = $this->em->getRepository(Member::class)->findOneBy([
'code' =>$record['member_code'],
'name' =>$record['name'],
'surname' =>$record['surname'],
]);
if (!$member) {
$member = new Member();
$member->setCode($record['member_code']);
$member->setName($record['name']);
$member->setName($record['surname']);
}
$member->setContactNumber($record['phone']);
$member->setAddress($record['address']);
$member->setEmail($record['email']);
$subject = $this->em->getRepository(Subject::class)->findOneBy([
'subject_code' => $record['subj_code']
]);
if (!$subject) {
$subject = new Subject();
$subject->setCode($record['subj_code']);
}
$subject->setTitle($record['subj_title']);
$subject->setDescription($record['subj_desc']);
$subject->setLocation($record['subj_loc']);
$lecturer = $this->em->getRepository(Lecturer::class)->findOneBy([
'subject' => $subject,
'name' => $record['lec_name'],
'code' => $record['lec_code'],
]);
if (!$lecturer) {
$lecturer = new Lecturer();
$lecturer->setSubject($subject);
$lecturer->setName($record['lec_name']);
$lecturer->setCode($record['lec_code']);
}
$lecturer->setEmail($record['lec_email']);
$lecturer->setContactNumber($record['lec_phone']);
$member->setLecturer($lecturer);
$validationErrors = $this->validator->validate($member);
if (!count($validationErrors)) {
$this->em->persist($member);
$this->em->flush();
} else {
// ...
}
}
You can notice this script has to query the database 3 times to check if one CSV line exists. In my case I have files with up to 2000+ lines, so for every line to perform 3 queries to check if that line exists or not is quite time-consuming.
Unfortunately, I also can't import the rows in batches because if one subject doesn't exist it will create it so many times until the batch is flushed to the database and then I sit with duplicate records that serve no point.
How can I improve performance and speed to the max? Like first get all records from the database and store it in arrays (memory consuming?) and then do the checks and add the line to the array and check from there...
Can somebody please help me find a way to improve this (with sample code please?)
To proceed, follow the below-mentioned steps: Step 1: First of all, start SQL Server Management Studio and connect to the database. Step 2: Next, under Object Explorer search for the database you want to export data in CSV. Step 3: Right-click on the desired database >> go to Tasks >> Export Data.
Databases can do EXACTLY the same thing with CSV files. Just as a journal is used to maintain the atomic nature of individual transactions, the same exact thing can be done with CSV.
To be honest, I do find 2000+ rows with 3x that amount of queries not that much. But, since you are asking for performance, here are my two cents:
Using a framework will always give overhead. Meaning that if you make this code in native PHP, it will already run quicker. Im not familiar with symfony, but I assume you store your data in a database. In MySQL you can use the command INSERT ... ON DUPLICATE KEY update
. If you have set the 3 fields (code, name, lastname) as a primary key (which I assume), you can use that to: insert data, but if the key already exists, update the values in the database. MySQL will do the checsk for you, to see if the data has changed: if not, no diskwrite will happen.
Im quite certain you can write native SQL to symfony, allowing you to use the security the framework provides, yet speeding up your insert.
Generally if you want performance my best experience has been to dump all the data into the database and then transform it in there using SQL statements. The DBS will be able optimize all of your steps this way.
You can import CSV-Files directly into your MySQL database with the SQL command
LOAD DATA INFILE 'data.csv'
INTO TABLE tmp_import
The command has a lot of options where you can specify your CSV file's format, for example:
If your data.csv is a full dump containing all old and new rows then you can just replace your current table with the imported one, after you fixed it up a bit.
For example it looks like your csv-file (and import table) might look a bit like
WTW-2LT, Alex, Johnson, subj_code1, ..., lec_name1, ...
WTW-2LT, Alex, Johnson, subj_code1, ..., lec_name2, ...
WTW-2LT, Alex, Johnson, subj_code2, ..., lec_name3, ...
WTW-2LU, John, Doe, subj_code3, ..., lec_name4, ...
You could then get the distinct rows via grouping:
SELECT member_code, name, surname
FROM tmp_import
GROUP BY member_code, name, surname
If member_code
is a key you can just GROUP BY member_code
in MySQL. The DBS won't complain even though I believe it's technically against the standard.
To get the rest of your data you do the same:
SELECT subj_code, subj_title, member_code
FROM tmp_import
GROUP BY subj_code
and
SELECT lec_code, lec_name, subj_code
FROM tmp_import
GROUP BY lec_code
assuming subj_code
and lec_code
are both keys for subjects and lectures.
To actually save this result as a table you can use MySQL's CREATE TABLE ... SELECT
-syntax, for example
CREATE TABLE tmp_import_members
SELECT member_code, name, surname
FROM tmp_import
GROUP BY member_code, name, surname
You can then do the inserts and updates in two queries:
INSERT INTO members (member_code, name, surname)
SELECT member_code, name, surname
FROM tmp_import_members
WHERE tmp_import_members.member_code NOT IN (
SELECT member_code FROM members WHERE member_code IS NOT NULL
);
UPDATE members
JOIN tmp_import_members ON
members.member_code = tmp_import_members.members_code
SET
members.name = tmp_import_members.name,
members.surname = tmp_import_members.surname;
and the same for subjects and lectures to your liking.
This all amounts to
Again: If your CSV-File contains all rows you could just replace your existing tables and save the 3 inserts and 3 updates.
Make sure that you create indexes on the relevant columns of your temporary tables so that MySQL can speed up the NOT IN
and JOIN
in the above queries.
You can run a custom sql first to get the count from all the three tables
SELECT
(SELECT COUNT(*) FROM member WHERE someCondition) as memberCount,
(SELECT COUNT(*) FROM subject WHERE someCondition) as subjectCount,
(SELECT COUNT(*) FROM lecturer WHERE someCondition) as lecturerCount
Then on the basis of count you can find if data is present in your table or not. You don't have to run the queries multiple times for uniqueness if you go with native SQL
Checkout this link to know how to run custom SQL in Doctrine
Symfony2 & Doctrine: Create custom SQL-Query
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With