Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Import database, update products that have changed, delete products that have been removed

I'm "ok" at basic MySQL, but this is "WAY OVER MY HEAD"!

Objectives:

  • Import database
  • update products that have changed
  • delete products that have been removed
  • Quickly and efficiently

The database table(s) are HUGE, speed is an issue.

Does not have to MyISAM is inoDB would be faster? Each database will be in a unique table.

I was given this as a starting place to what I'm trying to do:

CREATE TABLE `table` LIKE LiveTable
LOAD DATA INFILE..... INTO `table`
UPDATE `table`  SET delete=1; -- Set the delete field to true  because it will not have been updated
UPDATE `table` INNER JOIN`table`ON `LiveTable.ID`=`table.ID`
SET LiveTable.Col1=table.Col1, LiveTable.Col2=table.Col2….. delete=0
INSERT INTO LiveTable(ID,Col1,Col2,…  delete=0)
SELECT ID,Col1,Col2,...FROM `table`
LEFT JOIN LiveTable 
ON table.ID = LiveTable.ID
WHERE LiveTable.ID IS NULL
DELETE FROM LiveTableWHERE delete = 0
EMPTY TABLE `table`

> CREATE TABLE `product_table`   (
>      `programname` VARCHAR(100) NOT NULL,
>      `name`        VARCHAR(160) NOT NULL,
>      `keywords`    VARCHAR(300) NOT NULL,
>      `description` TEXT NOT NULL,
>      `sku`         VARCHAR(100) NOT NULL,
>      -- This is the only "unique identifier given, none will be duplicates"
>      `price`       DECIMAL(10, 2) NOT NULL,
>      `created`     TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
>      `updatedat`   TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
>      `delete`      TINYINT(4) NOT NULL DEFAULT '0',
>      PRIMARY KEY (`sku`)   ) ENGINE=myisam DEFAULT CHARSET=latin1;
> 
> CREATE TABLE IF NOT EXISTS `temptable` LIKE `product_table`;
> 
> TRUNCATE TABLE `temptable`; -- Remove data from temp table if for some
> reason it has data in it.
> 
> LOAD DATA LOW_PRIORITY LOCAL INFILE "catalog.csv" INTO TABLE
> `temptable`  FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY """"
> LINES TERMINATED BY "\n"  IGNORE 1 LINES (`PROGRAMNAME`, `NAME`,
> `KEYWORDS`, `DESCRIPTION`, `SKU`, `PRICE`);
> 
> 
> UPDATE `temptable` SET    `delete` = 1; -- Set the delete field to
> true UPDATE `temptable` ttable
>        INNER JOIN `product_table` mtable
>          ON ( mtable.sku = ttable.sku ) SET    mtable.programname = ttable.programname,
>        mtable.name = ttable.name,
>        mtable.keywords = ttable.keywords,
>        mtable.description = ttable.description,
>        mtable.sku = ttable.sku,
>        mtable.price = ttable.price,
>        mtable.created = ttable.created,
>        mtable.updatedat = NOW(),-- Set Last Update
>        mtable.delete = 0; -- Set Delete to NO
> 
> -- Not sure what this is for...  I'm LOST at this part...   
> INSERT INTO `product_table` VALUES      (`programname`,
>              `name`,
>              `keywords`,
>              `description`,
>              `sku`,
>              `price`,
>              `created`,
>              `updatedat`,
>              `delete`);
> 
> -- This type of join requires alias as far as I know? 
> SELECT `programname`,
>        `name`,
>        `keywords`,
>        `description`,
>        `sku`,
>        `price`,
>        `created`,
>        `updatedat`,
>        `delete` FROM   `temptable` tmptable
>        LEFT JOIN `product_table` maintbl
>          ON tmptable.sku = maintbl.sku WHERE  maintbl.sku IS NULL;
> 
> DELETE FROM `product_table` WHERE  `delete` = 0;
> 
> TRUNCATE TABLE `temptable`; `` remove all the data from temporary
> table.
like image 407
Brad Avatar asked Mar 27 '12 04:03

Brad


2 Answers

I answered this question myself here: https://dba.stackexchange.com/questions/16197/innodb-update-slow-need-a-better-option/16283#16283

Using the information I've received from here, the web and several internet chat rooms, I've come up with. Web source: http://www.softwareprojects.com/resources/programming/t-how-to-use-mysql-fast-load-data-for-updates-1753.html

[DEMO][1] http://sqlfiddle.com/#!2/4ebe0/1

The process is:

Import into a new temp table.
Update The old table information with information in Temp table.
Insert new data into the table. (Real world I'm making a new CSV file and using LOAD INTO for the insert)
delete everything that is no longer in the data feed.
delete the temp table.

This seems the fastest processes so far.

Let me know what your opinion is.

like image 63
Brad Avatar answered Nov 06 '22 17:11

Brad


InnoDB is usually much better than MyISAM at tables being available while INSERT, UPDATE and DELETE are happening, because InnoDB uses row level locking for updates whereas MyISAM uses table level locking.

That is the first step.

The second step is to disable all indexes on the table before loading data into a table using ALTER TABLE .. DISABLE KEYS and then enabling them back after the load using ALTER TABLE .. ENABLE KEYS.

The above two show large improvements in your performance.

As another optimization, when doing large scale updates, break them down into batches (perhaps based on the primary key) so that all the rows are not locked simultaneously.

like image 2
Sujoy Gupta Avatar answered Nov 06 '22 16:11

Sujoy Gupta