Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-assigning IDs in a non-IDENTITY type field in SQL Server database

WARNING: This tale of woe contains examples of code smells and poor design decisions, and technical debt.

If you are conversant with SOLID principles, practice TDD and unit test your work, DO NOT READ ON. Unless you want a good giggle at someone's misfortune and gloat in your own awesomeness knowing that you would never leave behind such a monumental pile of crap for your successors.

So, if you're sitting comfortably then I'll begin.

In this app that I have inherited and been supporting and bug fixing for the last 7 months I have been left with a DOOZY of a balls up by a developer that left 6 and a half months ago. Yes, 2 weeks after I started.

Anyway. In this app we have clients, employees and visits tables.

There is also a table called AppNewRef (or something similar) that ... wait for it ... contains the next record ID to use for each of the other tables. So, may contain data such as :-

TypeID    Description     NextRef
   1      Employees       804
   2      Clients         1708
   3      Visits          56783

When the application creates new rows for Employees, it looks in the AppNewRef table, gets the value, uses that value for the ID, and then updates the NextRef column. Same thing for Clients, and Visits and all the other tables whose NextID to use is stored in here.

Yes, I know, no auto-numbering IDENTITY columns on this database. All under the excuse of "when it was an Access app". These ID's are held in the (VB6) code as longs. So, up to 2 billion 147 million records possible. OK, that seems to work fairly well. (apart from the fact that the app is updating and taking care of locking / updating, etc., and not the database)

So, our users are quite happily creating Employees, Clients, Visits etc. The Visits ID is steady increasing a few dozen at a time. Then the problems happen. Our clients are causing database corruptions while creating batches of visits because the server is beavering away nicely, and the app becomes unresponsive. So they kill the app using task manager instead of being patient and waiting. Granted the app does seem to lock up.

Roll on to earlier this year and developer Tim (real name. No protecting the guilty here) starts to modify the code to do the batch updates in stages, so that the UI remains 'responsive'. Then April comes along, and he's working his notice (you can picture the scene now, can't you ?) and he's beavering away to finish the updates.

End of April, and beginning of May we update some of our clients. Over the next few months we update more and more of them.

Unseen by Tim (real name, remember) and me (who started two weeks before Tim left) and the other new developer that started a week after, the ID's in the visit table start to take huge leaps upwards. By huge, I mean 10000, 20000, 30000 at a time. Sometimes a few hundred thousand.

Here's a graph that illustrates the rapid increase in IDs used.

Take a look at his graph

Roll on November. Customer phones Tech Support and reports that he's getting an error. I look at the error message and ask for the database so I can debug the code. I find that the value is too large for a long. I do some queries, pull the information, drop it into Excel and graph it.

I don't think making the code handle anything longer than a long for the ID's is the right approach, as this app passes that ID into other DLL's and OCX's and breaking the interface on those just seems like a whole world of hurt that I don't want to encounter right now.

One potential idea that I'm investigating is try to modify the ID's so that I can get them down to a lower level. Essentially filling the gaps. Using the ROW_NUMBER function

What I'm thinking of doing is adding a new column to each of the tables that have a Foreign Key reference to these Visit ID's (not a proper foreign key mind, those constraints don't exist in this database). This new column could store the old (current) value of the Visit ID (oh, just to confuse things; on some tables it's called EventID, and on some it's called VisitID).

Then, for each of the other tables that refer to that VisitID, update to the new value.

Ideas ? Suggestions ? Snippets of T-SQL to help all gratefully received.

like image 872
cometbill Avatar asked Nov 14 '12 09:11

cometbill


People also ask

How do you modify an identity column in SQL Server?

You can not update identity column. SQL Server does not allow to update the identity column unlike what you can do with other columns with an update statement.

How can I return ID after insert in SQL Server?

We use SCOPE_IDENTITY() function to return the last IDENTITY value in a table under the current scope. A scope can be a module, trigger, function or a stored procedure. We can consider SQL SCOPE_IDENTITY() function similar to the @@IDENTITY function, but it is limited to a specific scope.

How do I manage the identity column in replication?

Identity ranges are managed by the Publisher and propagated to Subscribers by the Merge Agent (in a republishing hierarchy, ranges are managed by the root Publisher and the republishers). The identity values are assigned from a pool at the Publisher.


2 Answers

Option one:

Explicitly constrain all of your foreign key relationships, and set them to be ON UPDATE CASCADE.

This will mean that whenever you change the ID, the foreign keys will automatically be updated.

Then you just run something like this...

WITH
  resequenced AS
(
  SELECT
    ROW_NUMBER() OVER (ORDER BY id) AS newID,
    *
  FROM
    yourTable
)
UPDATE
  resequenced
SET
  id = newID

I haven't done this in ages, so I forget if it causes problems mid-update by having two records with the same id value. If it does, you could do somethign like this first...

UPDATE yourTable SET id = -id


Option two:

Ensure that none of your foreign key relationships are explicitly defined. If they are, note them donw and remove them.

Then do something like...

CREATE TABLE temp AS
  newID INT IDENTITY (1,1),
  oldID INT
)

INSERT INTO temp (oldID) SELECT id FROM yourTable

/* Do this once for the table you are re-identifiering              */
/* Repeat this for all fact tables holding that ID as a foreign key */
UPDATE
  factTable
SET
  foreignID = temp.newID
FROM
  temp
WHERE
  foreignID = temp.oldID

Then re-apply any existing foreign key relationships.

This is a pretty scary option. If you forget to update a table, you just borked your data. But, you can give that temp table a much nicer name and KEEP it.


Good luck. And may the lord have mercy on your soul. And Tim's if you ever meet him in a dark alley.

like image 181
MatBailie Avatar answered Sep 21 '22 18:09

MatBailie


I would create a numbers table that has just a sequence from 1 to whatever max with an increment of 1 is for long and then change the logic of getting the maxid for visitid and maybe the others doing a right join between the numbers and the visits table. and then you can just look for te min of that number

select min(number) from visits right join numbers on visits.id = numbers.number

That way you get all the gaps filled in without having to change any of the other tables.

but I would just redo the whole database.

like image 29
chrissie1 Avatar answered Sep 21 '22 18:09

chrissie1