Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git remove oldest revisions of a file

Tags:

git

I have a 33 MB large file where I want to permanently delete the oldest revisions of that file, so I only the latest X revisions are kept around. How to do it?

My bare repository has grow huge because of it.

I have tried the following.. but it removes the file entirely

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_manual.txt' HEAD

To identify the large files in my repository I use git-large-blob by Aristotle Pagaltzis.

like image 330
neoneye Avatar asked May 30 '09 21:05

neoneye


People also ask

How do I remove old versions from git?

As far as I know, this can't be done, because in git, every commit depends on the contents of the entire history up to that point. So the only way to get rid of the old, big files would be to "replay" the entire commit history (preferrably with the same commit timestamps and authors), omitting the big files.

What does git remove do?

The primary function of git rm is to remove tracked files from the Git index. Additionally, git rm can be used to remove files from both the staging index and the working directory. There is no option to remove a file from only the working directory.

How to remove file added in git?

The easiest way to delete a file in your Git repository is to execute the “git rm” command and to specify the file to be deleted. Note that by using the “git rm” command, the file will also be deleted from the filesystem.

Does git rm remove history?

No, git rm will only remove the file from the working directory and add that removal into the index. So only future commits are affected. All previous commits stay the same and the history will actually show when you removed the file from the repository.


3 Answers

I think you are on the right track with the git filter-branch command you tried. The problem is you haven't told it to keep the file in any commits, so it is removed from all of them. Now, I don't think there is a way to directly tell git-filter-branch to skip any commits. However, since the commands are run in a shell context, it shouldn't be too difficult to use the shell to remove all but the last X number of revisions. Something like this:

KEEP=10 I=0 NUM_COMMITS=$(git rev-list master | wc -l) \
git filter-branch --index-filter \
'if [[ ${I} -lt $((NUM_COMMITS - KEEP)) ]]; then
     git rm --cached --ignore-unmatch big_manual.txt;
 fi;
 I=$((I + 1))'

That would keep big_manual.txt in the last 10 commits.

That being said, like Charles has mentioned, I'm not sure this is the best approach, since you're in effect undoing the whole point of VCS by deleting old versions.

Have you already tried optimizing the git repository with git-gc and/or git-repack? If not, those might be worth a try.

like image 182
Dan Moulding Avatar answered Oct 11 '22 04:10

Dan Moulding


Note: this answer is about shortening history of a whole project, rather than removing single file from older history what the question was about!


The simplest way to shorten history of a whole project by using git filter-branch would be to use grafts mechanism (see repository layout documentation) to shorten history:

$ echo "$commit_id" >> .git/info/grafts

where $commit_id is a commit that you want to be a root (first commit) of a new repository. Check out using "git log" or graphical history viewer such as gitk that the history looks like you want, and run "git filter-branch --all"; the use of grafts is described in git-filter-branch documentation.

Or you can use shallow clone by using --depth <depth> option of git clone.



You can make use of grafts to remove part history of a single file (what was originally requested) using steps describe below. This solution consists of more steps than solution proposed by Dan Moulding, but each of steps is simpler, and you can check intermediate steps using "git log" or graphical history viewer.

  1. First, select point where you want to have file removed, and mark those commits by creating branches at those points. For example if you want to have file appear for first time in commit f020285b and have it removed in all it ancestors, mark it ancestor (assuming this is ordinary, non-merge commit) using

    $ git branch cleanup f020285b^
    
  2. Second, remove the file from the history beginning with cleanup (i.e. f020285b^) using git-filter-branch, as shown in "Examples" section of git-filter-branch manpage:

    $ git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_manual.txt' cleanup
    

    If you want to remove also all commits which had changed only to removed file you can additionally use --prune-empty option to git-filter-branch.

  3. Next, join rewritten part of history with the rest of history using grafts mechanism:

    $ echo $(git-rev-parse f020285b) $(git rev-parse cleanup) >> .git/info/grafts
    

    Then you can examine histry to check if it is joined correctly.

  4. Last, make grafts permanent (this would make all grafts permanent, but lets assume here that you don't use grafts otherwise) using git-filter-branch,

    $ git filter-branch cleanup..HEAD
    

    and remove grafts (as they are not needed any more), and the cleanup branch

    $ rm .git/info/grafts
    $ git branch -d cleanup
    

Final note: if you remove part of history of some file, you better make sure that project without this file makes sense (and for example compiles correctly).

like image 33
Jakub Narębski Avatar answered Oct 11 '22 06:10

Jakub Narębski


You might want to consider using git submodules. That way you can keep the images and other big files in another git repository, and the repository that has the source codes can refer to a particular revision of that other repository.

That will help you to keep the repository revisions in sync, because the parent repository contains a link to a particular sub repository revision. It will also let you to remove/rebase old revisions in the sub repository, without affecting the parent repository where your source code is - the removals of old revisions in a sub repository will not mess up the history of the parent repository, because you just update that to which revision the sub repository link in the parent repository points to.

like image 34
Esko Luontola Avatar answered Oct 11 '22 06:10

Esko Luontola