Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git fsck: duplicateEntries: contains duplicate file entries - cannot push to gitlab

Tags:

git

gitlab

We have a big git repository, which I want to push to a self-hosted gitlab instance.

The problem is that the gitlab remote does not let me push my repo:

git push --mirror https://mygitlab/xy/myrepo.git

This will give me this error:

Enumerating objects: 1383567, done.
Counting objects: 100% (1383567/1383567), done.
Delta compression using up to 8 threads
Compressing objects: 100% (207614/207614), done.
remote: error: object c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: 
duplicateEntries: contains duplicate file entries
remote: fatal: fsck error in packed object    

So I did a git fsck:

error in tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: duplicateEntries: contains duplicate file entries
error in tree 0d7286cedf43c65e1ce9f69b74baaf0ca2b73e2b: duplicateEntries: contains duplicate file entries
error in tree 7f14e6474400417d11dfd5eba89b8370c67aad3a: duplicateEntries: contains duplicate file entries

Next thing I did was to check git ls-tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867:

100644 blob c233c88b192acfc20548d9d9f0c81c48c6a05a66    fileA.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0    fileB.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0    fileB.cs
100644 blob d2a4248bcda39c0dc3827b495f7751b7cc06c816    fileC.xaml

Notice that fileB.cs is displayed twice, with the same hash. I assume that this is the problem, because why would the file be two times in the same tree with the same file name and blob hash?

Now I googled the problem but could not find a way how to fix this. One seemingly good resource I found was this: Tree contains duplicate file entries

However, it basically comes down to using git replace which does not really fix the problem, so git fsck will still print the error and prevent me from pushing to the remote.

Then there is this one which seems to remove the file entirely (but I still need the file, but only once, not twice in the tree): https://stackoverflow.com/a/44672692/826244

Is there any other way to fix this? I mean it really should be possible to fix so that git fsck does not throw any errors, right? I am aware that I will need to rewrite the entire history after the corrupted commits. I could not even find a way to get the commit that points to the specific trees, otherwise I might be able to use rebase and patching the corrupted commit or something. Any help would be greatly appreciated!

UPDATE: Pretty sure I know what to do, but not yet how to do it:

  1. Creating a new tree object from the old tree, but corrected with git mktree <- done
  2. Create a new commit that is identical to the old one that references the bad tree but with the newly fixed tree <- difficult, I cannot easily get the commit to the tree, my current solution runs like an hour or more and I do not know how to create the modified commit then, once I have found it
  3. Run git filter-branch -- --all <- Should persist the replacements of the commits

Sadly I cannot just use git replace --edit on the bad tree and then run git filter-branch -- --all because filter-branch seems to only work on commits, but ignores tree-replaces...

like image 342
Tim Avatar asked May 28 '19 14:05

Tim


2 Answers

The final solution was to write a tool that tackles this problem.

First step was to git unpack-objects all packfiles. Then I had to identify the commits that pointed to the tree entries with duplicates by reading all refs and then walking back in history checking all the trees. After I had the tools for that it was not so hard to now rewrite the trees of those commits and then rewriting all commits after that. After that I had to update the changed refs. This is the moment where I thoroughly tested the result as nothing was lost yet. Finally a git reflog expire --expire=now --all && git gc --prune=now --aggressive rewrote the pack and removed all loose objects that are not accessible anymore.

When I have the time I will upload the source code to github, as it performs really well and could be a template to similar problems. It ran only a few minutes on a 3.7GB repository (about 20GB unpacked). By now I also implemented reading from the packfiles, so no need to unpack anything anymore (which takes a lot of time and space).

Update: I worked a little more on the source and it now performs really well, even better than bfg for deleting a single file (no option switches yet). The source code is available here: https://github.com/TimHeinrich/GitRewrite Be aware, this was only tested against a single repository and only under windows on a core i7. It is highly unlikely that it will work on linux or with any other processor architecture

like image 132
Tim Avatar answered Sep 20 '22 10:09

Tim


You can try running git fast-export to export your repository into a data file, and then run git fast-import to re-import the data file into a new repository. Git will remove any duplicate entries during the fast-import process, which will solve your problem.

Be aware that you may have to make a decision about how to handle signed tags and such when you export by passing appropriate arguments to git fast-export; since you're rewriting history, you probably want to pass --signed-tags=strip.

like image 25
bk2204 Avatar answered Sep 19 '22 10:09

bk2204