Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query to find possible columns that should be foreign keys?

I am looking for a query to help with a cleanup of an old database that doesn't have many relationships where it should. I don't need it to be perfect, just to help guide me in beginning cleanup and starting to enforce data integrity.

I am assuming that all tables have the proper primary key and that a column in a not-yet-related table has the same name.

Theoretically speaking I could have three tables, one which has a composite key (I wouldn't choose to design the db like this, but am limited in the cleanup of it and these types of composite/primary/foreign keys are common):

Case.CaseId (PK)
Workstep.WorkstepId (PK)
Workstep.CaseId (PK,FK)
WorkQueue.CaseId (isn't related TO Case.CaseId but should be)

What I would like to be able to do is run a query and come up with results that give me something like table name, column name, and foreign key of the table that isn't related but should be, e.g.:

TABLE NAME, COLUMN NAME, SHOULD BE RELATED TO PRIMARY KEY
WorkQueue, CaseId, Case.CaseId

See the SQL I am using below but it is returning any primary key, even ones that are both a primary key but also are part of a foreign key. Using my example again and the SQL below, instead of returning 1 row I get 2:

TABLE NAME, COLUMN NAME, SHOULD BE RELATED TO PRIMARY KEY
WorkQueue, CaseId, Workstep.CaseId (I don't want this row since it is also related to the 'real' primary key, Case.CaseId)
WorkQueue, CaseId, Case.CaseId

    SELECT 
    SubqueryAllPotentialForeignKeys.TABLE_NAME
    ,SubqueryAllPotentialForeignKeys.COLUMN_NAME
    ,(PrimaryKeys.TABLE_NAME + '.' + PrimaryKeys.COLUMN_NAME) as 'Possible Primary Key'

--all potential foreign keys (column name matches another column name but there is no reference from this column anywhere else)
FROM
    (   
    SELECT
        INFORMATION_SCHEMA.COLUMNS.TABLE_NAME
        ,INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME
    FROM
        INFORMATION_SCHEMA.COLUMNS
    WHERE
        --only get columns that are in multiple tables
        INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME IN 
        (
            SELECT COLUMN_NAME FROM
                (SELECT COLUMN_NAME, COUNT(COLUMN_NAME) AS ColNameCount FROM INFORMATION_SCHEMA.COLUMNS GROUP BY COLUMN_NAME) AS SubQueryColumns
            WHERE ColNameCount > 1
        )

        --only get the table.column if not part of a foreign or primary key
        EXCEPT
        (
            SELECT TABLE_NAME, COLUMN_NAME  FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE 
        )

    ) AS SubqueryAllPotentialForeignKeys

LEFT JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS PrimaryKeys ON 
    SubqueryAllPotentialForeignKeys.COLUMN_NAME = PrimaryKeys.COLUMN_NAME

--when finding possible keys for our columns that don't have references, limit to primary keys
WHERE 
    PrimaryKeys.CONSTRAINT_NAME LIKE '%PK_%'

ORDER BY TABLE_NAME, COLUMN_NAME
like image 763
Mario Avatar asked Nov 18 '25 23:11

Mario


1 Answers

This may not be the most beautiful thing in the world but works pretty darn well:

    SELECT * FROM
(
    SELECT 
        SubqueryAllPotentialForeignKeys.TABLE_NAME
        ,SubqueryAllPotentialForeignKeys.COLUMN_NAME
        ,(PrimaryKeys.TABLE_NAME + '.' + PrimaryKeys.COLUMN_NAME) as 'Possible Primary Key'

    --all potential foreign keys (column name matches another column name but there is no reference from this column anywhere else)
    FROM
        (   
        SELECT
            INFORMATION_SCHEMA.COLUMNS.TABLE_NAME
            ,INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME
        FROM
            INFORMATION_SCHEMA.COLUMNS
        WHERE
            --only get columns that are in multiple tables
            INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME IN 
            (
                SELECT COLUMN_NAME FROM
                    (SELECT COLUMN_NAME, COUNT(COLUMN_NAME) AS ColNameCount FROM INFORMATION_SCHEMA.COLUMNS GROUP BY COLUMN_NAME) AS SubQueryColumns
                WHERE ColNameCount > 1
            )

            --only get the table.column if not part of a foreign or primary key
            EXCEPT
            (
                SELECT TABLE_NAME, COLUMN_NAME  FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE 
            ) 

        ) AS SubqueryAllPotentialForeignKeys

    LEFT JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS PrimaryKeys ON 
        SubqueryAllPotentialForeignKeys.COLUMN_NAME = PrimaryKeys.COLUMN_NAME

    --when finding possible keys for our columns that don't have references, limit to primary keys
    WHERE 
        PrimaryKeys.CONSTRAINT_NAME LIKE '%PK_%'
) AS Subquery

--exclude all keys that are primary but also foreign
WHERE [Possible Primary Key] NOT IN
    (
        SELECT (TABLE_NAME + '.' + COLUMN_NAME)
        FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
        WHERE CONSTRAINT_NAME LIKE 'FK_%'
    ) 

ORDER BY TABLE_NAME, COLUMN_NAME
like image 140
Mario Avatar answered Nov 21 '25 14:11

Mario



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!