Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design for ETL - Surrogate vs natural key

We are currently in the process to redesign our ETL database.

So far we had used the following design with natural keys: CustomerID, OrderID and SystemType

The OrderID can be repeated for different customers, this is why SystemType key helps us to create a unique index. Our joins are complicated as we always need to join on three keys.

We would like to use a surrogate key but when another extract is coming into the system we cannot identify the rows as our surrogate key is not included in the customer's extract.

Should we use the three columns as primary keys or should we concatenating them into one column and use that as primary key? I understand an autoincrement key is not an option.

Would it be possible that you share your thoughts on the preferred key design for a system like this?

Thanks,

Mathias

like image 332
Mathias Florin Avatar asked Jan 01 '26 20:01

Mathias Florin


1 Answers

In ETL scenarios it's usual to have both. You need the natural key to identify new from updated rows and you must maintain its uniqueness as you load data. Then assign a surrogate key to any new rows if you need it. Foreign keys in other tables can reference either the surrogate or the natural key, whichever you prefer. In ETL scenarios if the natural key attributes already exist as foreign key references in other tables then the cost of cascading surrogate keys through the schema can be much more expensive than just leaving the natural key values as they are.

like image 131
nvogel Avatar answered Jan 05 '26 06:01

nvogel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!