Scenario
I need to update a SQL 2008 database daily via a spreadsheet (the only option available). The format is pretty basic, however there are potentially millions of records. Column1 and Column3 will have many predefined duplicate values, already pulled out into separate tables.
Spreadsheet Sample
Column1 Column2 Column3
Apple 10 Red
Apple 20 Red
Apple 15 Blue
Apple 21 Green
Orange 10 Orange
Orange 7 Orange
Orange 9 Red
Orange 70 Blue
Orange 10 Blue
DB Setup
My database is set up with three separate tables:
//Lookup_Column1
id type
1 Apple
2 Orange
//Lookup_Column3
id type
1 Red
2 Blue
3 Green
4 Orange
//Main - this is what should be inserted, after Column1
//and Column2 are matched to their respective ID's
key Column1 Column2 Column3
1 1 10 1
2 1 20 1
3 1 15 2
4 1 21 3
5 2 10 4
6 2 7 4
7 2 9 1
8 2 70 2
9 2 10 2
Question
How can I write the SQL to insert records that match the information from the lookup tables? How can I go from this:
INSERT INTO Main(Column1, Column2) VALUES ('Apple', 10, 'Red');
To this:
INSERT INTO Main(Column1, Column2) VALUES (1, 10, 1);
//pulled from lookup tables, where Apple = 1 and Red = 1
You could try something like this:
INSERT INTO Main(Column1, Column2, Column3) VALUES
(
(SELECT id FROM Lookup_Column1 WHERE type = 'Apple'),
10,
(SELECT id FROM Lookup_Column3 WHERE type = 'Red')
);
There isn't any fault-tolerance, but it would work as long as you could parse your spreadsheet values into SELECT statements.
The source of data that inserted into a table is defined as
VALUES ( { DEFAULT | NULL | expression } [ ,...n ] ) [ ,...n ]
| derived_table
| execute_statement
| <dml_table_source>
| DEFAULT VALUES
So we can use a derived_table which is defined as
derived_table
Is any valid SELECT statement that returns rows of data to be loaded into the table. The SELECT statement cannot contain a common table expression (CTE).
INSERT INTO
MAIN
(Column1, Column2, column3)
SELECT
lc1.id,
10,
lc2.id
FROM
Lookup_Column1 lc1,
Lookup_Column2 lc2
WHERE
lc1.type = 'Apple'
and
lc2.type = 'Red'
For more information see INSERT (Transact-SQL)
If you could get your spreadsheet values into a staging table (or linked to the spreadsheet directly using a linked server or OPENDATASOURCE ) you could change your INSERT clause to
INSERT INTO
MAIN
( Column1, Column2, column3)
SELECT
lc1.id,
s.column2,
lc2.id
FROM
OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\YourdataSource.xls;Extended Properties=EXCEL 5.0')...[Sheet1$] s
INNER JOIN Lookup_Column1 lc1
ON s.column1 = lc1.type
INNER JOIN Lookup_Column2 lc2
ON s.column3 = lc2.type
This would allow you to remove the looping that you're currently thinking of doing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With