Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrating from TFS to VSTS - Reducing TPC Database Size

We are planning a migration from an on-premises TFS instance to VSTS very shortly. Ahead of the migration, we run the pre-requisite Validation task and obtained the following output report on the TPC database size:

"The database is currently 191GBs. This is above the recommended size of 150GBs to use the DACPAC import method. The largest table size is currently 172GBs. This is above the recommended size of 20GBs to use the DACPAC import method.

Validation completed 'Validate Collection Database Size' with result Warning, message The largest table size is currently 172GBs. This is above the recommended size of 20GBs to use the DACPAC import method."

We are therefore keen to reduce the size of the TPC database and have two main considerations:

  1. Shrink the Database and generate the DACPAC from the resultant output.

  2. Delete any of the following objects which are unused or redundant:

    a) Older Workspaces
    b) Build Results
    c) Redundant Team Projects
    d) Unused Files
    e) Test Attachments created during test runs
    f) XAML Builds

Would therefore appreciate some advice or feedback on the pros and cons of either approach and which would be recommended.

like image 559
hitman126 Avatar asked Jan 03 '23 16:01

hitman126


1 Answers

Given that you need to reduce your largest table by 150GB, I wonder whether DACPAC is every going to be an option. That said, it's always a good idea to clean up your TFS instance. Your first step won't help a lot until you've managed to strip enough data to actually get any benefit out of a shrink.

Your identified actions would indeed help, most are already documented here. Queries that can aid in detecting where your space is allocated are also found in this recent support ticket.

Delete old workspaces

Deleting workspaces and shelvesets can reduce your migration and upgrade times considerably. either use the tf commandline or leverage a tool like the TFS SideKicks to identify and delete these.

Build results

Not just build results, but often overlooked the actual build records can take up a considerable amount of data. Use tfsbuild destroy (XAML) to permanently delete the build records. In the past, I've encountered clients who had 1.8 million "hidden" builds in their database and removing them shaved off quite a considerable amount of data. These records were kept around for the warehouse.

Old team projects

Of course, destroying old team projects can give back a lot of data. Anything you don't need to send to azure helps. You could also consider splitting the collection and to leave behind the old projects. That will give you the option to detach that collection and store it somewhere, should you ever need that data again.

Redundant files

Deleted branches are a very common hidden size hog. When deleting things in TFVC, they are not actually deleted, they're just hidden. Finding deleted files and especially old development or feature branches can give you back a lot of data. Use tf destroy to get rid of them.

You may also want to look for checked in nuget package folders, those can quickly rack up a lot of space as well.

Test Attachments

Ohh yes, especially when you use test attachments, these can grow like crazy, depending on your TFS version either use the built-in test attachment cleanup features or use the Test Attachment Cleaner from the TFS power tools.

XAML Builds

The build definitions themselves won't take a lot of db space, but the build results may. But those have been covered in a previous section.

Git Repositories

You may have data in your git repositories that are no longer accessible due to force pushes or deleted branches. It's also possible that certain data in Git could be packed more efficiently. To clean your repositories you have to clone them locally, clean them up, delete the remote repo from TFS and push the cleaned copy to a new repository (you can use the same name as the old one). Doing this will break references with existing build definitions and you will have to fix these up. While you're at it, you could also run the BFG repo Cleaner and convert the repositories to enable Git-LFS support to handle large binary files in your repositories more elegantly.

git clone --mirror <<repo>>
# optionally run BFG repo cleaner at thi s point
git reflog expire --expire=now --all 
git gc --prune=now --aggressive
git repack -adf
# Delete and recreate the remote repository with the same name
git push origin --all
git push origin --tags

Work item (attachments)

Work items can gather up a considerable amount of data, especially when people start attaching large attachments to them. You can use witadmin destroywi to delete workitems with unreasonably large attachments. To retain the work item, but delete its attachments you can delete the attachments from the current work item and then clone it. After cloning, destroy the old work item to allow the attachments to be cleaned up.

Old work items that you no longer need (say the sprint ites from 6 years ago) can also be deleted. My colleague Rene has a nice tool that allows you to bulk-destroy by first creating the appropriate work item query.

Be sure to run the cleanup jobs

TFS often doesn't directly prune data from the database, in many cases it just marks stuff as deleted for latest processing. To force the cleanup to happen immediately, run the following stored procedures on your Project Collection database:

EXEC prc_CleanupDeletedFileContent 1
# You may have to run the following command multiple times, the last
# parameter is the batch size, if there are more items to prune than the 
# passed in number, you will have to run it multiple times
EXEC prc_DeleteUnusedFiles 1, 0, 100000

Other useful queries

To identify how much data is stored in each section, there are a few useful queries you can run. The actual query depends on your TFS version, but since you're preparing for migration I suspect you're on TFS 2017 or 2018 at the moment.

Find the largest tables:

SELECT TOP 10 o.name, 
SUM(reserved_page_count) * 8.0 / 1024 SizeInMB,
SUM(CASE 
WHEN p.index_id <= 1 THEN p.row_count
ELSE 0
END) Row_Count
FROM sys.dm_db_partition_stats p
JOIN sys.objects o
ON p.object_id = o.object_id
GROUP BY o.name
ORDER BY SUM(reserved_page_count) DESC

Find the largest content contributors:

SELECT Owner = 
CASE
WHEN OwnerId = 0 THEN 'Generic' 
WHEN OwnerId = 1 THEN 'VersionControl'
WHEN OwnerId = 2 THEN 'WorkItemTracking'
WHEN OwnerId = 3 THEN 'TeamBuild'
WHEN OwnerId = 4 THEN 'TeamTest'
WHEN OwnerId = 5 THEN 'Servicing'
WHEN OwnerId = 6 THEN 'UnitTest'
WHEN OwnerId = 7 THEN 'WebAccess'
WHEN OwnerId = 8 THEN 'ProcessTemplate'
WHEN OwnerId = 9 THEN 'StrongBox'
WHEN OwnerId = 10 THEN 'FileContainer'
WHEN OwnerId = 11 THEN 'CodeSense'
WHEN OwnerId = 12 THEN 'Profile'
WHEN OwnerId = 13 THEN 'Aad'
WHEN OwnerId = 14 THEN 'Gallery'
WHEN OwnerId = 15 THEN 'BlobStore'
WHEN OwnerId = 255 THEN 'PendingDeletion'
END,
SUM(CompressedLength) / 1024.0 / 1024.0 AS BlobSizeInMB
FROM tbl_FileReference AS r
JOIN tbl_FileMetadata AS m
ON r.ResourceId = m.ResourceId
AND r.PartitionId = m.PartitionId
WHERE r.PartitionId = 1
GROUP BY OwnerId
ORDER BY 2 DESC

If file containers are the issue:

SELECT CASE WHEN Container = 'vstfs:///Buil' THEN 'Build'
WHEN Container = 'vstfs:///Git/' THEN 'Git'
WHEN Container = 'vstfs:///Dist' THEN 'DistributedTask'
ELSE Container 
END AS FileContainerOwner,
SUM(fm.CompressedLength) / 1024.0 / 1024.0 AS TotalSizeInMB
FROM (SELECT DISTINCT LEFT(c.ArtifactUri, 13) AS Container,
fr.ResourceId,
ci.PartitionId
FROM tbl_Container c
INNER JOIN tbl_ContainerItem ci
ON c.ContainerId = ci.ContainerId
AND c.PartitionId = ci.PartitionId
INNER JOIN tbl_FileReference fr
ON ci.fileId = fr.fileId
AND ci.DataspaceId = fr.DataspaceId
AND ci.PartitionId = fr.PartitionId) c
INNER JOIN tbl_FileMetadata fm
ON fm.ResourceId = c.ResourceId
AND fm.PartitionId = c.PartitionId
GROUP BY c.Container
ORDER BY TotalSizeInMB DESC
like image 62
jessehouwing Avatar answered Jan 05 '23 07:01

jessehouwing