Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tree contains duplicate file entries

After some issues with our hosting, we decided to move our Git repository to GitHub. So I cloned the repository and tried pushing that to GitHub. However, I stumbled upon some errors we have never encountered before:

 C:\repositories\appName [master]> git push -u origin master  Counting objects: 54483, done.  Delta compression using up to 2 threads.  Compressing objects: 100% (18430/18430), done.  error: object 9eac1e639bbf890f4d1d52e04c32d72d5c29082e:contains duplicate file entries  fatal: Error in object  fatal: sha1 file '<stdout>' write error: Invalid arguments  error: failed to push some refs to 'ssh://[email protected]/User/Project.git' 

When I run fsck:

C:\repositories\appName [master]> git fsck --full Checking object directories: 100% (256/256), done. error in tree 0db4b3eb0e0b9e3ee41842229cdc058f01cd9c32: contains duplicate file entries error in tree 9eac1e639bbf890f4d1d52e04c32d72d5c29082e: contains duplicate file entries error in tree 4ff6e424d9dd2e3a004d62c56f99e798ac27e7bf: contains duplicate file entries Checking objects: 100% (54581/54581), done. 

When I run ls-tree with the bad SHA1:

C:\repositories\appName [master]> git ls-tree 9eac1e639bbf890f4d1d52e04c32d72d5c29082e 160000 commit 5de114491070a2ccc58ae8c8ac4bef61522e0667  MenuBundle 040000 tree 9965718812098a5680e74d3abbfa26f527d4e1fb    MenuBundle 

I tried all of the answers already given on this StackOverflow question, but haven't had any success. Is there any way I can prevent this repository and its history from being doomed?

like image 407
user1791257 Avatar asked Nov 01 '12 14:11

user1791257


2 Answers

Method 1.

Do the git fsck first.

$ git fsck --full error in tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29: contains duplicate file entries 

If this won't fix the problem, you're in trouble. You can either ignore the problem, restore the repository from the backup, or move the files into new repository. If you having trouble pushing the repo into github, try changing the repository to different one or check: Can't push to GitHub error: pack-objects died of signal 13 and Can't push new git repository to github.


The below methods are only for advanced git users. Please do the backup before starting. The fix is not guaranteed by the following steps and it can make it even worse, so do it for your own risk or education purposes.


Method 2.

Use git ls-tree to identify duplicate files.

$ git read-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Just a hint. $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Try also with: --full-tree -rt -l 160000 commit def08273a99cc8d965a20a8946f02f8b247eaa66  commerce_coupon_per_user 100644 blob 89a5293b512e28ffbaac1d66dfa1428d5ae65ce0    commerce_coupon_per_user 100644 blob 2f527480ce0009dda7766647e36f5e71dc48213b    commerce_coupon_per_user 100644 blob dfdd2a0b740f8cd681a6e7aa0a65a0691d7e6059    commerce_coupon_per_user 100644 blob 45886c0eda2ef57f92f962670fad331e80658b16    commerce_coupon_per_user 100644 blob 9f81b5ca62ed86c1a2363a46e1e68da1c7b452ee    commerce_coupon_per_user 

As you can see, it contains the duplicated file entries (commerce_coupon_per_user)!

$ git show bb81a5af7e9203f36c3201f2736fca77ab7c8f29 tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29  commerce_coupon_per_user commerce_coupon_per_user commerce_coupon_per_user commerce_coupon_per_user commerce_coupon_per_user commerce_coupon_per_user 

Again, you can see the duplicated file entries (commerce_coupon_per_user)!

You may try to use git show for each listed blob and check the content if each file.

Then keep running ls-tree for that invalid ls-tree object across your different git clones to see if you can track the valid object, or if all are broken.

git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29  If you found the valid object containing non-duplicated file entries, save it into the file and re-create by using `git mktree` and `git replace`, e.g.  remote$ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 > working_tree.txt $ cat working_tree.txt | git mktree NEWTREEbb81a5af7e9203f36c3201f2736fca77ab7c8f29 $ git replace bb81a5af7e9203f36c3201f2736fca77ab7c8f29 NEWTREE4b825dc642cb6eb9a060e54bf8d69288fbee4904 

If this won't help, you can undo the change by:

$ git replace -d NEWTREE4b825dc642cb6eb9a060e54bf8d69288fbee4904 

Method 3.

When you know which file/dir entry is duplicated, you may try to remove that file and re-create it later on. In example:

$ find . -name commerce_coupon_per_user # Find the duplicate entry. $ git rm --cached `find . -name commerce_coupon_per_user` # Add -r for the dir. $ git commit -m'Removing invalid git entry for now.' -a $ git gc --aggressive --prune # Deletes loose objects! Please do the backup before just in case. 

Read more:

  • git gc: cleaning up after yourself

Method 4.

Check your commit for invalid entries.

Lets check our tree again.

$ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 --full-tree -rt -l 160000 commit def08273a99cc8d965a20a8946f02f8b247eaa66  commerce_coupon_per_user 100644 blob 89a5293b512e28ffbaac1d66dfa1428d5ae65ce0     270    commerce_coupon_per_user .... $ git show def08273a99cc8d965a20a8946f02f8b247eaa66 fatal: bad object def08273a99cc8d965a20a8946f02f8b247eaa66 $ git cat-file commit def08273a99cc8d965a20a8946f02f8b247eaa66 fatal: git cat-file def08273a99cc8d965a20a8946f02f8b247eaa66: bad file 

It seems the above commit is invalid, lets scan our git log for this commit using one of the following commands to check what's going on:

$ git log -C3 --patch | less +/def08273a99cc8d965a20a8946f02f8b247eaa66 $ git log -C3 --patch | grep -C10 def08273a99cc8d965a20a8946f02f8b247eaa66  commit 505446e02c68fe306aec5b0dc2ccb75b274c75a9 Date:   Thu Jul 3 16:06:25 2014 +0100      Added dir.  new file mode 160000 index 0000000..def0827 --- /dev/null +++ b/sandbox/commerce_coupon_per_user @@ -0,0 +1 @@ +Subproject commit def08273a99cc8d965a20a8946f02f8b247eaa66 

In this particular case, our commit points to the bad object, because it was commited as part of git subproject which doesn't exist anymore (check git submodule status).

You may exclude that invalid object from the ls-tree and re-create tree without this bad object by e.g.:

$ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 | grep -v def08273a99cc8d965a20a8946f02f8b247eaa66 | git mktree b964946faf34468cb2ee8e2f24794ae1da1ebe20  $ git replace bb81a5af7e9203f36c3201f2736fca77ab7c8f29 b964946faf34468cb2ee8e2f24794ae1da1ebe20  $ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Re-test. $ git fsck -full 

Note: The old object should still throw the duplicate file entries, but if you've now duplicates in the new tree, then you need to remove more stuff from that tree. So:

$ git replace # List replace objects. bb81a5af7e9203f36c3201f2736fca77ab7c8f29 $ git replace -d bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Remove previously replaced object. 

Now lets try to remove all commits and blobs from that tree, and replace is again:

$ git ls-tree bb81a5af7e9203f36c3201f2736fca77ab7c8f29 | grep -ve commit -e blob | git mktree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 $ git replace bb81a5af7e9203f36c3201f2736fca77ab7c8f29 4b825dc642cb6eb9a060e54bf8d69288fbee4904 

Now you have empty tree for that invalid entry.

$ git status # Check if everything is fine. $ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904 # Re-check $ git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 --full-tree # Re-check 

If you have some weird changes for stage, reset your repository by:

$ git reset HEAD --hard 

If you'll have the following error:

HEAD is now at 5a4ed8e Some message at bb81a5af7e9203f36c3201f2736fca77ab7c8f29 

Do the rebase and remove that commit (by changing pick to edit):

$ git rebase -i $ git commit -m'Fixed invalid commit.' -a rebase in progress; onto 691f725 You are currently editing a commit while rebasing branch 'dev' on '691f725'. $ git rebase --continue $ git reset --hard $ git reset HEAD --hard $ git reset origin/master --hard 

Method 5.

Try removing and squashing invalid commits containing invalid objects.

$ git rebase -i HEAD~100 # 100 commits behind HEAD, increase if required. 

Read more: Git Tools - Rewriting History and How do I rebase while skipping a particular commit?


Method 6.

Identifying the invalid git objects by the following methods for manual removal:

  • for uncompressed objects (*please remove first two characters, as git uses it for the directory name):

    $ find . -name 81a5af7e9203f36c3201f2736fca77ab7c8f29 
  • for compressed objects

    $ find . -name \*.idx -exec cat {} \; | git show-index | grep bb81a5af7e9203f36c3201f2736fca77ab7c8f29 # Then you need to find the file manually. $ git unpack-objects $FILE # Expand the particular file. $ git unpack-objects < .git/objects/pack/pack-*.pack # Expand all. 

See: How to unpack all objects of a git repository?


Related:

  • Git FAQ: How to fix a broken repository?
  • [SA] git tree contains duplicate file entries
  • [SA] How do you restore a corrupted object in a git repository (for newbies)?
  • [SA] How can I manually remove a blob object from a tree in Git?
  • [SA] How can I recover my Git repository for a "missing tree" error?
  • [SA] How to view git objects and index without using git
  • [SA] Git recovery: "object file is empty". How to recreate trees?
  • [SA] Tree contains duplicate file entries
  • [SA] git tree (still) contains duplicates and an erroneous signal 13
  • On undoing, fixing, or removing commits in git
like image 137
kenorb Avatar answered Oct 06 '22 18:10

kenorb


Note: Git 2.1 will add two option to git replace which can be useful when modifying a corrupted entry in a git repo:

  • commit 4e4b125 by Christian Couder (chriscool)

    --edit <object> 

Edit an object's content interactively. The existing content for <object> is pretty-printed into a temporary file, an editor is launched on the file, and the result is parsed to create a new object of the same type as <object>.
A replacement ref is then created to replace <object> with the newly created object.
See git-var for details about how the editor will be chosen.

And commit 2deda62 by Jeff King (peff):

replace: add a --raw mode for --edit

One of the purposes of "git replace --edit" is to help a user repair objects which are malformed or corrupted.
Usually we pretty-print trees with "ls-tree", which is much easier to work with than the raw binary data.

However, some forms of corruption break the tree-walker, in which case our pretty-printing fails, rendering "--edit" useless for the user.

This patch introduces a "--raw" option, which lets you edit the binary data in these instances.

Knowing how Jeff is used to debug Git (like in this case), I am not too surprised to see this option.


Note that before Git 2.27 (Q2 2020), "git fsck" ensured that the paths recorded in tree objects were sorted and without duplicates, but it failed to notice a case where a blob is followed by entries that sort before a tree with the same name.

This has been corrected.

See commit 9068cfb (10 May 2020) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit 0498840, 14 May 2020)

fsck: report non-consecutive duplicate names in trees

Suggested-by: Brandon Williams
Original-test-by: Brandon Williams
Signed-off-by: René Scharfe
Reviewed-by: Luke Diamand

Tree entries are sorted in path order, meaning that directory names get a slash ('/') appended implicitly.

Git fsck checks if trees contains consecutive duplicates, but due to that ordering there can be non-consecutive duplicates as well if one of them is a directory and the other one isn't.

Such a tree cannot be fully checked out.

Find these duplicates by recording candidate file names on a stack and check candidate directory names against that stack to find matches.


With Git 2.30 (Q1 2021), the logic to deal with a repack operation that ended up creating the same packfile has been simplified.

See commit 2fcb03b (17 Nov 2020), and commit 704c4a5 (16 Nov 2020) by Taylor Blau (ttaylorr).
See commit 63f4d5c (16 Nov 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 39d38a5, 03 Dec 2020)

builtin/repack.c: don't move existing packs out of the way

Helped-by: Jeff King
Signed-off-by: Taylor Blau

When 'git repack'(man) creates a pack with the same name as any existing pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and then renames the new one into place.

Eventually, it would be nice to have 'git repack'(man) allow for writing a multi-pack index at the critical time (after the new packs have been written / moved into place, but before the old ones have been deleted). Guessing that this option might be called '--write-midx', this makes the following situation (where repacks are issued back-to-back without any new objects) impossible:

$ git repack -adb $ git repack -adb --write-midx   

In the second repack, the existing packs are overwritten verbatim with the same rename-to-old sequence. At that point, the current MIDX is invalidated, since it refers to now-missing packs. So that code wants to be run after the MIDX is re-written. But (prior to this patch) the new MIDX can't be written until the new packs are moved into place. So, we have a circular dependency.

This is all hypothetical, since no code currently exists to write a MIDX safely during a 'git repack(man) ' (the 'GIT_TEST_MULTI_PACK_INDEX' does so unsafely). Putting hypothetical aside, though: why do we need to rename existing packs to be prefixed with 'old-' anyway?

This behavior dates all the way back to 2ad47d6 ("git-repack: Be careful when updating the same pack as an existing one.", 2006-06-25, Git v1.4.1 -- merge). 2ad47d6 is mainly concerned about a case where a newly written pack would have a different structure than its index. This used to be possible when the pack name was a hash of the set of objects. Under this naming scheme, two packs that store the same set of objects could differ in delta selection, object positioning, or both. If this happened, then any such packs would be unreadable in the instant between copying the new pack and new index (i.e., either the index or pack will be stale depending on the order that they were copied).

But since 1190a1a ("pack-objects: name pack files after trailer hash", 2013-12-05, Git v1.9-rc0 -- merge), this is no longer possible, since pack files are named not after their logical contents (i.e., the set of objects), but by the actual checksum of their contents.
So, this old- behavior can safely go, which allows us to avoid our circular dependency above.

In addition to avoiding the circular dependency, this patch also makes 'git repack'(man) a lot simpler, since we don't have to deal with failures encountered when renaming existing packs to be prefixed with 'old-'.

This patch is mostly limited to removing code paths that deal with the 'old' prefixing, with the exception of files that include the pack's name in their own filename, like .idx, .bitmap, and related files. The exception is that we want to continue to trust what pack-objects wrote. That is, it is not the case that we pretend as if pack-objects didn't write files identical to ones that already exist, but rather that we respect what pack-objects wrote as the source of truth. That cuts two ways:

  • If pack-objects produced an identical pack to one that already exists with a bitmap, but did not produce a bitmap, we remove the bitmap that already exists. (This behavior is codified in t7700.14).
  • If pack-objects produced an identical pack to one that already exists, we trust the just-written version of the corresponding .idx, .promisor, and other files over the ones that already exist. This ensures that we use the most up-to-date versions of this files, which is safe even in the face of format changes in, say, the .idx file (which would not be reflected in the .idx file's name).

When rebuilding the multi-pack index file reusing an existing one, we used to blindly trust the existing file and ended up carrying corrupted data into the updated file, which has been corrected with Git 2.33 (Q3 2021).

See commit f89ecf7, commit ec1e28e, commit 15316a4, commit f9221e2 (23 Jun 2021) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 3b57e72, 16 Jul 2021)

midx: report checksum mismatches during 'verify'

Suggested-by: Derrick Stolee
Signed-off-by: Taylor Blau

'git multi-pack-index verify'(man) inspects the data in an existing MIDX for correctness by checking that the recorded object offsets are correct, and so on.

But it does not check that the file's trailing checksum matches the data that it records.
So, if an on-disk corruption happened to occur in the final few bytes (and all other data was recorded correctly), we would:

  • get a clean result from 'git multi-pack-index verify', but
  • be unable to reuse the existing MIDX when writing a new one (since we now check for checksum mismatches before reusing a MIDX)

Teach the 'verify' sub-command to recognize corruption in the checksum by calling midx_checksum_valid().


With Git 2.34 (Q4 2021), "git repack"(man) has been taught to generate multi-pack reachability bitmaps.

See commit e861b09 (06 Oct 2021) by Jeff King (peff).
See commit 324efc9 (01 Oct 2021), and commit 6d08b9d, commit 1d89d88, commit 5f18e31, commit a169166, commit 90f838b, commit 08944d1, commit 6fb22ca, commit 56d863e (28 Sep 2021) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 0b69bb0, 18 Oct 2021)

builtin/repack.c: support writing a MIDX while repacking

Signed-off-by: Taylor Blau

Teach git repack(man) a new --write-midx option for callers that wish to persist a multi-pack index in their repository while repacking.

There are two existing alternatives to this new flag, but they don't cover our particular use-case.
These alternatives are:

  • Call 'git multi-pack-index write'(man) after running 'git repack', or
  • Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'.

The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether).

Introduce a new option which eliminates this race by teaching git repack to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed.

This option is compatible with git repack's '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated").

The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place.

git repack now includes in its man page:

This option has no effect if multiple packfiles are created, unless writing a MIDX (in which case a multi-pack bitmap is created).

And still git repack now includes in its man page:

-m

--write-midx

Write a multi-pack index (see git multi-pack-index) containing the non-redundant packs.

like image 37
VonC Avatar answered Oct 06 '22 19:10

VonC