Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Table containing TEXT column growing continuously

We've got a table in a production system which (for legacy reasons) is running SQL 2005 (9.0.5266) and contains a TEXT column (along with a few other columns of various datatypes).

All of a sudden (since a week ago) we noticed the size of this one table increasing linearly by 10-15GB per day (whereas previously it has always remained at a constant size). The table is a queue for a messaging system, and as such the data in it completely refreshes itself every few seconds. At any one time there could be anywhere from 0 to around 1000 rows, but it fluctuates rapidly as messages are inserted, and sent (at which point they're deleted).

We can't find anything that was changed on the day the growth started - so have no obvious potential cause identified at this stage.

One "obvious" culprit is the TEXT column, and so we checked to see if any massive values were now being stored, but (using DATALENGTH) we found no single rows above around ~32k. We've run CHECKDB, updated space usage, rebuild all indexes, etc - nothing reduces the size (and CHECKDB showed no errors).

We've queried sys.allocation_units and the size increase is definitely LOB_DATA (which show total_pages and used_pages increasing together at a constant rate).

To reduce the database size last night we simple created a new table along-side the one in question (which is luckily referenced via a view by the application), dropped the old table, and renamed the new one. We left last night, taking comfort in the fact that we'd alleviated the space issues, and that we had a backup of the dodgy table to investigate further today. However, this morning the table size is already up to 14GB (and growing), while there are only the usual ~500 rows in the table, and MAX(DATALENGTH(text_column)) is only showing around 35k.

Any ideas as to what could be causing this "runaway" growth, or anything else that we could try or query to get more information about what exactly is using the space?

Cheers, Dave

like image 542
DBDave Avatar asked Nov 14 '22 00:11

DBDave


1 Answers

This is a general problem in dealing with queues. The article linked talks about Service Broker queues, but the issue is the same for ordinary tables used as queues. If you have a busy system with generous resources (CPU, memory, disk IO) and you push a queue on this system to high throughput, then a large portion of these resources will be used to handle the two operations: enqueue (ie. INSERT) and dequeue (ie. DELETE). However, the full lifecycle of the record requires three operations: INSERT, DELETE and ghost purge. They cost roughly the same in terms on CPU/memory/disk IO needs, so if you use that queue for say 90% of the system resources then you should allocate 30% resources to each. But only the first two are under your control (ie. explicit statements running in user sessions). The third one, the ghost purge, is a background process controlled by SQL Server, and there is no chance the ghost cleanup process will be allowed to consume 30% resources. This is a fundamental issue and, if you push the pedal-to-the-metal for long enough time your *will hit it. Once ghost records accumulate and pass system/workload specific threshold the performance will degrade quickly and the symptoms will spiral to abysmal performance (a negative feedback loop forms).

Luckily, since you do not use Service Broker queues but real tables as queues, you have some better tools at your disposal, like ALTER TABLE REORGANIZE and ALTER TABLE REBUILD. By far the best solution is an online index/table rebuild. SQL Server 2012 supports online operations on tables containing BLOBs and you can eleverage that. Of course you would have to get rid of the deprecated obsolete TEXT type and use VARCHAR(MAX), but that goes w/o saying.

As a side note:

If you have pages with nothing but ghost records on them, then you will not read those pages again and they won't get marked for cleanup

This is incorrect. Pages with nothing but ghosts will be detected and purged by scans. As I said, the issue is not detection, is resources. If you push your system enough, you will race ahead of the ghost cleanup and he will never catch up.

like image 73
Remus Rusanu Avatar answered Feb 06 '23 20:02

Remus Rusanu