I have a large table (~10 million records) that contains several keys into other, smaller tables. The keys are unique in each of the smaller tables but not in the large one. I would like to add a column to the large table from one of the smaller tables based on the keys matching, but I'm not sure of the "right" way to do it. I have a solution that works, but it takes a fair amount of time (thought that may be unavoidable) and just generally doesn't feel like it's the optimal way to do it. Here's what I have:
CREATE TABLE new_big_table LIKE big_table;
ALTER TABLE new_big_table ADD(new_column TINYINT NOT NULL);
INSERT INTO new_big_table SELECT big_table.*, smaller_table.my_column
  FROM big_table JOIN smaller_table ON big_table.key1 = smaller_table.key1
  AND big_table.key2 = smaller_table.key2;
This gets the job done, but it really smells like I'm doing it all wrong. It seems like at minimum I shouldn't need to create a duplicate of the table to get this done. Is there some more direct (and more efficient?) way of doing this?
It might be worth mentioning that this is for a personal, hobby project at home, so I am free to hog all the resources of the machine (since I'm the only one using it). As such, if there are any straightforward performance tuning tips for doing things like this, I'd appreciate them (I'm experimenting with this on an Amazon EC2 instance since it should be a lot faster and have more memory than my personal desktop).
Have you tried to add the column in-place?
alter table big_table add new_column tinyint;
update  big_table bt
join    smaller_table st
on      bt.key1 = st.key1
        and bt.key2 = st.key2
set     bt.new_column = st.my_column;
alter table big_table modify new_column tinyint not null;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With