Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the T-SQL to normalize an existing table?

I'd like to convert an existing table into the 1st normal form (simplest normalization possible; see the example).

Do you happen to know what is the T-SQL is for this sort of problem? Many thanks!

enter image description here

Update

Tried the answer below, it worked perfectly. Here is the steps I used to test the answer:

  1. Start up Microsoft SQL Management Studio.
  2. Create the tables, with the data below.
  3. Make sure that the ID in "Customer" is set to "Primary Key", and "identity".
  4. Make sure that the ID in "Order" has no special settings (its a foreign key).
  5. Open up a database diagram, then create a 1:* relationship between the "Customer" and "Order" tables.
  6. Execute the script on the "Customer" table and the "Order" table, it will automatically normalize the data properly for you.
  7. This is very useful if you are starting from a flat .csv file that has just been imported, and you want to copy the information into a normalized form in the database.
like image 996
Contango Avatar asked Feb 11 '11 17:02

Contango


2 Answers

In the case above @Thomas has a perfectly workable solution. However, sometimes people simplify for the sake of asking the question, so I'll address what you might want to do if you need to to go to many tables (or the first table has no unique constraint on name for instance), not just two.

First, I will insert the data to a staging table and add a column for the id which will be null. Then I would write an insert to the parent table using the OUTPUT clause to output the ids and the natural key to a table variable. Then I would use the table variable to update the id field in the staging table. Then I would insert records from the staging table to the other tables. Since I now have the id there, it is no longer necessary to access the orginal parent table. (If the number of records are large, I might also index the staging table).

Now if you have no natural key, the process becomes harder because you have no way to identify which record goes to who. Then I usually add an identity to the staging table and then do the intial import to the parent table one record at a time (including the stagingtableid as a variable in the cursor) and then update the staging table with each Parent table id as it is created. Once all the intial records are updated, I use set-based processes to insert or update to the other tables.

The staging table also gives you the chance to fix any bad data locally before trying to put it into your production tables.

Other syntax you may need to know if things are complex or if this is a repeated process is the MERGE statment. This will insert if it is a new record and update if it is an existing record.

If this is a very complex transformation, you may consider using SSIS.

like image 75
HLGEM Avatar answered Sep 21 '22 22:09

HLGEM


Starting with the Customer table

INSERT INTO Customer (Name)
SELECT DISTINCT Name
FROM Flat_CSV_File

If you have repeated imports

INSERT INTO Customer (Name)
SELECT DISTINCT f.Name
FROM Flat_CSV_File f
LEFT OUTER JOIN Customer c ON f.Name = c.Name
WHERE c.Id IS NULL

Orders (your table name Order is a reserved word in TSQL, so you need to quote it with square brackets)

INSERT INTO [Order] (CustomerId, Description, Cost)
SELECT c.Id, f.Description, f.Cost
FROM Flat_CSV_File f
INNER JOIN Customer c ON f.Name = c.Name
like image 37
devio Avatar answered Sep 22 '22 22:09

devio