I have a raw data with millions of rows and no constraints and I want to identify unique columns for primary keys through SQL code.
Is there any way we can identify primary key candidates through SQL code?
You could try the following queries
select count(<potential_column>),count(distinct <potential column>) from <table>
When the counts match, you have a candidate key for consideration.
For example, if I had a table of people called PEOPLE, I might do something like.
select 'FullName' as FldName, count(fullname) as Tot,count(distinct fullName) as TotD
from People
union
select 'SSN' as FldName, count(SSN) as Tot,count(distinct SSN) as TotD
from People
This would return two rows, showing the field name and counts. Any row where tot matches totd would be a candidate, although in this example, I'd rely on SSN (Social Security number) based on knowledge of the data.
You could run the same query using concatenated fields, such as
select 'First/Last' as FldName, count(Firstname+lastName) as Tot,
count(distinct firstname+LastName) as TotD
from People
if you are looking for compound keys, although trying to decide which fields to concatenate would require a better understanding of your data
You can use the following code to produce a list of unique columns for a table:
DECLARE
@TableSchema sysname = 'dbo'
,@TableName sysname = 'Task'
,@ColumnName sysname=''
,@sql nvarchar(max)=''
;
---
DECLARE c CURSOR FOR (SELECT [COLUMN_NAME] FROM [INFORMATION_SCHEMA].[COLUMNS] WHERE [TABLE_SCHEMA] = @TableSchema AND [TABLE_NAME] = @TableName);
OPEN c;
FETCH NEXT FROM c INTO @ColumnName;
WHILE @@FETCH_STATUS = 0 BEGIN
SET @sql += 'SELECT '''+@ColumnName+''' FROM '+QUOTENAME(@TableSchema)+'.'+QUOTENAME(@TableName)+' HAVING COUNT(*)=COUNT(DISTINCT '+QUOTENAME(@ColumnName)+')';
FETCH NEXT FROM c INTO @ColumnName;
IF @@FETCH_STATUS = 0 SET @sql += 'UNION ALL ';
END;
CLOSE c;
DEALLOCATE c;
EXEC SP_EXECUTESQL @sql;
---
This is not necessarily the best way to choose your primary key though unless you can guarantee the column won't have duplicates in the future. This code can be useful however when analysing datasets to discover which columns tend to contain unique values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With