Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update a column value for 500 million rows in Interval Partitioned table

we've a table with 10 Billion rows. This table is Interval Partitioned on date. In a subpartition we need to update the date for 500 million rows that matches the criteria to a new value. This will definetly affect creation of new partition or something because the table is partitioned on the same date. Could anyone give me pointers to a best approach to follow?

Thanks in advance!

like image 635
pratikch Avatar asked Oct 16 '14 20:10

pratikch


People also ask

How to partition a table in MySQL?

One of the simpler steps is Step 1: Create a new partitioned table from the old table using <create table as select * from old table> syntax including the partitioning clause Step 2: Drop or rename the old table Step 3: Rename the table created in step 1 to the name of dropped table.

What is the best way to update the target columns?

The associated target columns are found using the Column Mapping. Update Strategy Select Delete/Insert:Removes overlapping rows (matching on Unique Keys) and then inserts all incoming rows. This is effectively an update, and is very fast.

How do I update old data in a table?

In the Table Update properties (shown below), the table to be updated (our old data) is selected in the Target Table property. The input columns are mapped to output columns using the Column Mapping property (shown below). Notice that although in this case the names match, that isn't necessary.

How do I use tailnum in table update?

In this example, tailnum uniquely identifies a plane in the data. Since Table Update is used on an existing (target) table, we'll need a Join Expression to match up records from the incoming data and the target table. In many cases, it's most appropriate to match a column of unique values that will ensure records are updated with the correct data.


1 Answers

If you are going to update partitioning key and the source rows are in a single (sub)partition, then the reasonable approach would be to:

  1. Create a temporary table for the updated rows. If possible, perform the update on the fly

    CREATE TABLE updated_rows
    AS
    SELECT add_months(partition_key, 1), other_columns...
      FROM original_table PARITION (xxx)
     WHERE ...;
    
  2. Drop original (sub)partition

    ALTER TABLE original_table DROP PARTITION xxx;
    
  3. Reinsert the updated rows back

    INSERT /*+append*/ INTO original_table
    SELECT * FROM updated_rows;
    

In case you have issues with CTAS or INSERT INTO SELECT for 500M rows, consider partitioning the temporary table and moving the data in batches.

like image 79
Kombajn zbożowy Avatar answered Nov 09 '22 05:11

Kombajn zbożowy