Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all rows in all tables which aren't used in any FK relation anymore

In order to trim a production database for loading in a test system, we've deleted rows in many tables. This now left us with cruft in a couple of tables, namely rows which aren't used in any FK relation anymore. What I want to achieve is like the garbage collection in Java.

Or to put it another way: If I have M tables in the database. N of them (i.e. most but not all) have foreign key relations. I've deleted a couple of high level rows (i.e. which only have outgoing FK relations) via SQL. This leaves the rows in the related tables alone.

Does someone have a SQL stored procedure or a Java program which finds the N tables and then follows all the FK relations to delete rows which are no longer needed.

If finding the N tables to too complex, I could probably provide the script a list of tables to scan or, preferably, a negative list of tables to ignore.

Also note:

  1. We have some tables which are used in many (>50) FK relations, i.e. A, B, C, ... all use rows in Z.
  2. All FK relations use the technical PK column which is always a single column.
like image 349
Aaron Digulla Avatar asked Oct 08 '14 12:10

Aaron Digulla


1 Answers

This issue is addressed in the MySQL Performance blog, http://www.percona.com/blog/2011/11/18/eventual-consistency-in-mysql/

He provides the following meta query, to generate queries that will identify orphaned nodes;

SELECT CONCAT(
 'SELECT ', GROUP_CONCAT(DISTINCT CONCAT(K.CONSTRAINT_NAME, '.', P.COLUMN_NAME,
  ' AS `', P.TABLE_SCHEMA, '.', P.TABLE_NAME, '.', P.COLUMN_NAME, '`') ORDER BY P.ORDINAL_POSITION), ' ',
 'FROM ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ',
 'LEFT OUTER JOIN ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ',
 ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION),
 ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ',
 'WHERE ', K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME, ' IS NULL;'
 ) AS _SQL
 FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K
 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P
 ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME)
 AND P.CONSTRAINT_NAME = 'PRIMARY'
 WHERE K.REFERENCED_TABLE_NAME IS NOT NULL
 GROUP BY K.CONSTRAINT_NAME;

I converted this to find childless parents, producing;

SELECT CONCAT(
 'SELECT ', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ' ',

 'FROM ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ',
 'LEFT OUTER JOIN ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ',
 ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION),
 ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ',
 'WHERE ', K.CONSTRAINT_NAME, '.', K.COLUMN_NAME, ' IS NULL;'
 ) AS _SQL
 FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K
 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P
 ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME)
 AND P.CONSTRAINT_NAME = 'PRIMARY'
 WHERE K.REFERENCED_TABLE_NAME IS NOT NULL
 GROUP BY K.CONSTRAINT_NAME;
like image 153
harvey Avatar answered Sep 28 '22 11:09

harvey