Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove largefiles from Mercurial repo

See also this question.

Without knowing what I was doing, I enabled the largefiles extension, committed a file and pushed it to kiln. Now I know the error of my ways, and I need to permanently revert this change.

I followed the guidance from SO on the subject; and I can remove largefiles locally, but this doesn't affect the remote repos in kiln. I have tried opening the repo in KilnRepositories on the Kiln server and nuking the largefiles folder (as well as deleting 'largefiles' from the requires file), but after a few pushes/pulls the folder and the require's line come back.

Is there a way to make this permanent? (Setting requires to readonly doesn't work either).

like image 270
Christopher Avatar asked Mar 14 '13 20:03

Christopher


1 Answers

Note: This is true at least of (for Windows) TortoiseHg 2.4 (Mercurial 2.2.1) -> 2.7.1 (Mercurial 2.5.2). I will not speak for future or older versions.

After looking through the various mercurial extensions available, I have concluded that it is generally not possible to convert a repository 'back' once a file has been committed using the largefiles extension.

First, the two rationales for why you do not want largefiles in play on your repos: one and two.

Once a file has has been committed as a largefile, to remove it, all references to the '.hglf' path must be removed from the repo. A backout isn't sufficient, as its commit contents will reference the path of the file, including the '.hglf' folder. Once mercurial sees this, it will write 'largefiles' back to the /.hg/requires file and the repo is once more largefile locked. Likewise with hg forget and remove.

Option 1: If your repo is in isolation (you have end-to-end control of the repo in all of its local and remote locations AND no one else has branched from this repo), it may be possible to use the mq extension and strip the changeset. This is probably only a viable option if you have caught the mistake in time.

Option 2: If the offending changeset (the largefile commit) exists on a commit phase that is draft (or that can be forced back into draft), then it may be possible to import the commit to mq and unapply the changeset using hg qpop. This is superior to stripping, because it preserves the commit history forward from the extracted changeset. In real life, this is frequently not possible, because you've likely already performed merges and pushed/pulled from public phase branches. However, if caught soon enough, mq can offer a way to salvage the repo.

Option 3: If the offending changeset is referenced in one and only one place (the original commit), and no one has attempted to backout/remove/forget the changset (thus creating multiple references), it may be possible to use hg rebase, to fold the first child changeset after the offense with the parent changeset of the offense. In doing so, the offensive changeset becomes a new head which can then be stripped off with mq strip. This can work where attempts to import to mq have failed.

Option 4: If none of the above work, you can use transplant or graft, to export all of the non-offending changesets as patches (careful to export them in the correct sequence), then hg update to the first sane changeset before the offense, mq strip the repo of all forward changesets and then reapply your exported patches in sequence.

Option 5: (What I ultimately did). Clone the repo locally, so that you have two copies: clone_to_keep, clone_to_destroy. In clone_to_keep, update to the first sane changeset before the offense. Mq strip all forward changesets. Merge back down if left with multiple heads. In clone_to_destroy, update to the tip. In windows explorer, copy everything in /clone_to_destroy except the .hg and .hglf folders to the /clone_to_keep folder. Inside Tortoise, commit all changes in clone_to_keep as a single changeset. Preserve one remote instance of clone_to_destroy in a readonly state for historical purposes, and destroy all others.

Option 6: The nuclear option. If all else fails, and if you don't care about the integration of your repo with external systems (bug tracking, CI, etc), you can follow the aforementioned SO post and use the hg convert extension. This will create a new copy of the infected repo, removing all references to the offending changesets; however, it does so by iterating each changeset in the entire repo and committing it to the new repo as a NEW changeset. This creates a repo which is incompatible with any existing branch repos--none of the changeset ids will line up. Unless you have no branch repos, this option will probably never work.

In all cases, you then have to take your fix and manually reapply to each distinct repository instance (copy the repo folder, clone, whatever your preferred method).

In the end, it turns out that enabling largefiles is an extremely expensive mistake to make. It's time consuming and ultimately destructive to fix. I don't recommend ever allowing largefiles to make it into your repos.

like image 155
Christopher Avatar answered Nov 09 '22 05:11

Christopher