Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I fix a corrupted Git repository?

Tags:

git

People also ask

What is fsck in git?

It stands for File System ChecK. The name is taken from the Unix fsck command, which is used to validate a file system.

What is the effect if a file in the main repository becomes corrupted in git?

if your local . git folder gets corrupted, that corruption won't be propagated upstream to the server, so you should always be able to get a clean copy, minus your most recent changes. as for your question, most of the files in . git are not changed when you add a new commit.

How do I fix repository not found?

Fix 5 - Git - remote: Repository not found Just run the git remote update command which updates the local repo with the remote repo and its credentials. If this command is executed without any issues then your issue will be resolved.


As an alternative to Todd's last option (Full Restores and Re-Initialization), if only the local repository is corrupted, and you know the URL to the remote, you can use this to reset your .git to match the remote (replacing ${url} with the remote URL):

mv -v .git .git_old &&            # Remove old Git files
git init &&                       # Initialise new repository
git remote add origin "${url}" && # Link to old repository
git fetch &&                      # Get old history
# Note that some repositories use 'master' in place of 'main'. Change the following line if your remote uses 'master'.
git reset origin/main --mixed     # Force update to old history.

This leaves your working tree intact, and only affects Git's bookkeeping.

I also recently made a Bash script for this very purpose (Appendix A), which wraps a bit of safety around this operation.

Note:

  • If your repository has submodules, this process will mess them up somehow, and the only solution I've found so far is deleting them and then using git submodule update --init (or recloning the repository, but that seems too drastic).
  • This tries to determine the correct choice between 'main' and 'master' depending on local configuration settings, however there may be some issues if used on a repository that uses 'master', on a machine that has 'main' as the default branch.
  • This uses wget to check that the url is reachable before doing anything. This is not necessarily the best operation to determine that a site is reachable, and if you haven't got wget available, this can likely be replaced with ping -c 1 "${url_base}" (linux), ping -n 1 "${url_base}" (windows), or curl -Is "${url_base}"

Appendix A - Full script

Also published as a gist, though it is now out of date.

#!/bin/bash

# Usage: fix-git [REMOTE-URL]
#   Must be run from the root directory of the repository.
#   If a remote is not supplied, it will be read from .git/config
#
# For when you have a corrupted local repo, but a trusted remote.
# This script replaces all your history with that of the remote.
# If there is a .git, it is backed up as .git_old, removing the last backup.
# This does not affect your working tree.
#
# This does not currently work with submodules!
# This will abort if a suspected submodule is found.
# You will have to delete them first
# and re-clone them after (with `git submodule update --init`)
#
# Error codes:
# 1: If a URL is not supplied, and one cannot be read from .git/config
# 4: If the URL cannot be reached
# 5: If a Git submodule is detected


if [[ "$(find -name .git -not -path ./.git | wc -l)" -gt 0 ]] ;
then
    echo "It looks like this repo uses submodules" >&2
    echo "You will need to remove them before this script can safely execute" >&2
    echo "Then use \`git submodule update --init\` to re-clone them" >&2
    exit 5
fi

if [[ $# -ge 1 ]] ;
then
    url="$1"
else
    if ! url="$(git config --local --get remote.origin.url)" ;
    then
        echo "Unable to find remote 'origin': missing in '.git/config'" >&2
        exit 1
    fi
fi

if ! branch_default="$(git config --get init.defaultBranch)" ;
then
    # if the defaultBranch config option isn't present, then it's likely an old version of git that uses "master" by default
    branch_default="master"
fi

url_base="$(echo "${url}" | sed -E 's;^([^/]*://)?([^/]*)(/.*)?$;\2;')"
echo "Attempting to access ${url_base} before continuing"
if ! wget -p "${url_base}" -O /dev/null -q --dns-timeout=5 --connect-timeout=5 ;
then
    echo "Unable to reach ${url_base}: Aborting before any damage is done" >&2
    exit 4
fi

echo
echo "This operation will replace the local repo with the remote at:"
echo "${url}"
echo
echo "This will completely rewrite history,"
echo "but will leave your working tree intact"
echo -n "Are you sure? (y/N): "

read confirm
if ! [ -t 0 ] ; # i'm open in a pipe
then
    # print the piped input
    echo "${confirm}"
fi
if echo "${confirm}"|grep -Eq "[Yy]+[EeSs]*" ; # it looks like a yes
then
    if [[ -e .git ]] ;
    then
        # remove old backup
        rm -vrf .git_old | tail -n 1 &&
        # backup .git iff it exists
        mv -v .git .git_old
    fi &&
    git init &&
    git remote add origin "${url}" &&
    git config --local --get remote.origin.url | sed 's/^/Added remote origin at /' &&
    git fetch &&
    git reset "origin/${branch_default}" --mixed
else
    echo "Aborting without doing anything"
fi

TL;DR

Git doesn't really store history the way you think it does. It calculates history at run-time based on an ancestor chain. If your ancestry is missing blobs, trees, or commits then you may not be able to fully recover your history.

Restore Missing Objects from Backups

The first thing you can try is to restore the missing items from backup. For example, see if you have a backup of the commit stored as .git/objects/98/4c11abfc9c2839b386f29c574d9e03383fa589. If so you can restore it.

You may also want to look into git-verify-pack and git-unpack-objects in the event that the commit has already been packed up and you want to return it to a loose object for the purposes of repository surgery.

Surgical Resection

If you can't replace the missing items from a backup, you may be able to excise the missing history. For example, you might examine your history or reflog to find an ancestor of commit 984c11abfc9c2839b386f29c574d9e03383fa589. If you find one intact, then:

  1. Copy your Git working directory to a temporary directory somewhere.
  2. Do a hard reset to the uncorrupted commit.
  3. Copy your current files back into the Git work tree, but make sure you don't copy the .git folder back!
  4. Commit the current work tree, and do your best to treat it as a squashed commit of all the missing history.

If it works, you will of course lose the intervening history. At this point, if you have a working history log, then it's a good idea to prune your history and reflogs of all unreachable commits and objects.

Full Restores and Re-Initialization

If your repository is still broken, then hopefully you have an uncorrupted backup or clone you can restore from. If not, but your current working directory contains valid files, then you can always re-initialize Git. For example:

rm -rf .git
git init
git add .
git commit -m 'Re-initialize repository without old history.'

It's drastic, but it may be your only option if your repository history is truly unrecoverable. YMMV.


Before trying any of the fixes described on this page, I would advise to make a copy of your repository and work on this copy only. Then at the end if you can fix it, compare it with the original to ensure you did not lose any file in the repair process.

Another alternative which worked for me was to reset the Git head and index to its previous state using:

git reset --keep

You can also do the same manually by opening the Git GUI and selecting each "Staged changes" and click on "Unstage the change". When everything is unstaged, you should now be able to compress your database, check your database and commit.

I also tried the following commands, but they did not work for me. But they might for you depending on the exact issue you have:

git reset --mixed
git fsck --full
git gc --auto
git prune --expire now
git reflog --all

Finally, to avoid this problem of synchronization damaging your Git index (which can happen with Dropbox, SpiderOak, or any other cloud disk), you can do the following:

  1. Convert your .git folder into a single "bundle" Git file by using: git bundle create my_repo.git --all, then it should work just the same as before, but since everything is in a single file you won't risk the synchronization damaging your git repo any more.
  2. Disable instantaneous synchronization: SpiderOak allows you to set the scheduling for checking changes to "automatic" (which means that it is as soon as possible, being monitoring file changes thanks to the OS notifications). This is bad, because it will start to upload changes as soon as you are doing a change, and then download the change, so it might erase the latest changes you were just doing. A solution to fix this issue is to set the changes monitoring delay to 5 minutes or more. This also fixes issues with instant saving note taking applications (such as Notepad++).

Here's a script (Bash) to automate the first solution by @CodeGnome to restore from a backup (run from the top level of the corrupted repository). The backup doesn't need to be complete; it only needs to have the missing objects.

git fsck 2>&1 | grep -e missing -e invalid | awk '{print $NF}' | sort -u |
    while read entry; do
        mkdir -p .git/objects/${entry:0:2}
        cp ${BACKUP}/objects/${entry:0:2}/${entry:2} .git/objects/${entry:0:2}/${entry:2}
    done