Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce the depth of an existing git clone?

Tags:

git

I have a clone. I want to reduce the history on it, without cloning from scratch with a reduced depth. Worked example:

$ git clone [email protected]:apache/spark.git # ... $ cd spark/ $ du -hs .git 193M    .git 

OK, so that's not so but, but it'll serve for this discussion. If I try gc it gets smaller:

$ git gc --aggressive Counting objects: 380616, done. Delta compression using up to 4 threads. Compressing objects: 100% (278136/278136), done. Writing objects: 100% (380616/380616), done. Total 380616 (delta 182748), reused 192702 (delta 0) Checking connectivity: 380616, done. $ du -hs .git 108M    .git 

Still, pretty big though (git pull suggests that it's still push/pullable to the remote). How about repack?

$ git repack -a -d --depth=5 Counting objects: 380616, done. Delta compression using up to 4 threads. Compressing objects: 100% (95388/95388), done. Writing objects: 100% (380616/380616), done. Total 380616 (delta 182748), reused 380616 (delta 182748) Pauls-MBA:spark paul$ du -hs .git 108M    .git 

Yup, didn't get any smaller. --depth for repack isn't the same for clone:

$ git clone --depth 1 [email protected]:apache/spark.git Cloning into 'spark'... remote: Counting objects: 8520, done. remote: Compressing objects: 100% (6611/6611), done. remote: Total 8520 (delta 1448), reused 5101 (delta 710), pack-reused 0 Receiving objects: 100% (8520/8520), 14.82 MiB | 3.63 MiB/s, done. Resolving deltas: 100% (1448/1448), done. Checking connectivity... done. Checking out files: 100% (13386/13386), done. $ cd spark $ du -hs .git 17M .git 

Git pull says it's still in step with the remote, which surprises nobody.

OK - so how to change an existing clone to a shallow clone, without nixing it and checking it out afresh?

like image 817
paul_h Avatar asked Jul 03 '16 16:07

paul_h


People also ask

What is git clone -- depth?

"Clone depth" is a feature of git to reduce server load: Instead of cloning the complete repository (as usually done with git), using clone depth just clones the last clone-depth-number revisions of your repository. In literature this is also called "shallow clone"

How do you partially clone in git?

Git's partial clone feature is enabled by specifying the --filter option in your git clone command. The full list of filter options exist in the git rev-list documentation, since you can use git rev-list --filter=<filter> --all to see which objects in your repository match the filter.


2 Answers

since at least git version 2.14.1 (september 2017) there is

git fetch --depth 10 

this will fetch the newest commits from origin (if there are any) and then cut off the local history to depth of 10 (if it was longer).

for normal purposes your git history is now at length of 10. but beware that the old commits still linger in your local repository and that they still exist in the remote repository.

if your aim was to have a shorter log because you currently don't need years worth of commit history then you are done. your log will be short and most common git commands now only see 10 commits.

if your aim was to free disk space because older commits have huge binary blobs which you don't need to work now then you have to actually remove the old commits from your local repository. see below for a short description how to do so.

if your aim was to actually remove the old commits (for example to remove a password from old commits) then you need to remove the commits from the remote repository. also from all clones of the remote repository. see below for links with more info on how to remove commits from remote repo.


how to remove the old commits to free disk space.

data loss warning! read the notes and pay attention to what you are doing.

in short: to actually remove the commits to free the disk space you need to remove all references that are holding them. that is (as far as i know) the reflog and the tags. also branches and stashes.

to clear the reflog:

git reflog expire --expire=all --all 

to remove all tags:

git tag -l | xargs git tag -d 

branches are a bit more complicated than tags. think for yourself how to handle your branches.

as for stashes; they should be temporary anyways. so just drop them like it's hot.

git stash drop 

once you have removed all references you can call git garbage collector to remove dangling commits:

git gc --prune=all 

now the old commits should be completely removed from disk.

note about the remove all tags command: the command will remove all tags from your local repository. if all your tags are also on the remote then this is fine. the next git fetch will refetch the relevant tags. but if you have tags which are only in your local repository then you need to backup them somehow.

the reflog is cleared automatically after certain time (90 days?) by automatic git gc. tags however will stay around forever. so if you want to free disk space from old commits you have to at least remove the tags manually.

the reflog is something like a local history of past local repository states. many git commands will record the previous state of the local repository in the reflog. with the reflog you can undo some commands or at least retrieve lost data if you made a mistake. so think before you clear the reflog.

the reflog is entirely local to your local repository.


see also

https://linuxhint.com/git-shallow-clone-and-clone-depth/

http://gitready.com/intermediate/2009/02/09/reflog-your-safety-net.html

How do I edit past git commits to remove my password from the commit logs?

Delete all local git branches

like image 115
Lesmana Avatar answered Sep 19 '22 10:09

Lesmana


git clone --mirror --depth=5  file://$PWD ../temp rm -rf .git/objects mv ../temp/{shallow,objects} .git rm -rf ../temp 

This really isn't cloning "from scratch", as it's purely local work and it creates virtually nothing more than the shallowed-out pack files, probably in the tens of kbytes total. I'd venture you're not going to get more efficient than this, you'll wind up with custom work that uses more space in the form of scripts and test work than this does in the form of a few kb of temporary repo overhead.

like image 23
jthill Avatar answered Sep 23 '22 10:09

jthill