Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify primary key candidates through SQL code

I have a raw data with millions of rows and no constraints and I want to identify unique columns for primary keys through SQL code.

Is there any way we can identify primary key candidates through SQL code?

like image 909
Ajay Avatar asked Dec 23 '14 11:12

Ajay


2 Answers

You could try the following queries

select count(<potential_column>),count(distinct <potential column>) from <table>

When the counts match, you have a candidate key for consideration.

For example, if I had a table of people called PEOPLE, I might do something like.

select 'FullName' as FldName, count(fullname) as Tot,count(distinct fullName) as TotD 
from People
union
select 'SSN' as FldName, count(SSN) as Tot,count(distinct SSN) as TotD 
from People

This would return two rows, showing the field name and counts. Any row where tot matches totd would be a candidate, although in this example, I'd rely on SSN (Social Security number) based on knowledge of the data.

You could run the same query using concatenated fields, such as

  select 'First/Last' as FldName, count(Firstname+lastName) as Tot,
          count(distinct firstname+LastName) as TotD 
    from People

if you are looking for compound keys, although trying to decide which fields to concatenate would require a better understanding of your data

like image 83
Sparky Avatar answered Sep 23 '22 21:09

Sparky


You can use the following code to produce a list of unique columns for a table:

DECLARE
     @TableSchema sysname = 'dbo'
    ,@TableName sysname = 'Task'
    ,@ColumnName sysname=''
    ,@sql nvarchar(max)=''
;
---
DECLARE c CURSOR FOR (SELECT [COLUMN_NAME] FROM [INFORMATION_SCHEMA].[COLUMNS] WHERE [TABLE_SCHEMA] = @TableSchema AND [TABLE_NAME] = @TableName);
OPEN c;
FETCH NEXT FROM c INTO @ColumnName;
WHILE @@FETCH_STATUS = 0 BEGIN
    SET @sql += 'SELECT '''+@ColumnName+''' FROM '+QUOTENAME(@TableSchema)+'.'+QUOTENAME(@TableName)+' HAVING COUNT(*)=COUNT(DISTINCT '+QUOTENAME(@ColumnName)+')';
    FETCH NEXT FROM c INTO @ColumnName;
    IF @@FETCH_STATUS = 0 SET @sql += 'UNION ALL ';
END;
CLOSE c;
DEALLOCATE c;
EXEC SP_EXECUTESQL @sql;
---

This is not necessarily the best way to choose your primary key though unless you can guarantee the column won't have duplicates in the future. This code can be useful however when analysing datasets to discover which columns tend to contain unique values.

like image 33
AeroX Avatar answered Sep 20 '22 21:09

AeroX