I'm in the process of splitting up an old suite of applications which originally resided in a single Subversion repository.
I've converted it over to a Git repository and removed what I don't want, but I'd like to slim the repository down by getting rid of the historical data associated with the deleted files (the original repository will be maintained for reference purposes so it isn't needed in the new one).
Ideally what I'd like to do is go through the entire repository and remove any files or folders not present in the working directory, along with any history associated with them. This would leave me with the contents of HEAD and a history of commits affecting those files. However, I haven't come across a way of doing this (orphaning HEAD doesn't help as it doesn't preserve the history).
Is this possible? I know how to remove a single file or folder from the entire history via git-filter-branch, but there's too many files and folders for this to be a practical approach... unless there's a way of filtering on all files not in HEAD?
Execute the following command: git rm --cached path/to/file . Git will list the files it has deleted. The --cached flag should be used if you want to keep the local copy but remove it from the repository.
Here's how you can use git filter-branch to get rid of all files that you don't want:
Get a list of the filenames that you don't want to appear in the history both the old names and the new names in case of renames. For example put them in a file called toberemoved.txt
Run git filter-branch like this:
$ git filter-branch --tree-filter "rm -f `cat toberemoved.txt`" branch1 branch2 ...
Here's the relevant man page from git filter-branch:
--tree-filter <command>
This is the filter for rewriting the tree and its contents. The
argument is evaluated in shell with the working directory set to
the root of the checked out tree. The new tree is then used as-is
(new files are auto-added, disappeared files are auto-removed -
neither .gitignore files nor any other ignore rules HAVE ANY
EFFECT!).
So just make sure that the list of files you want deleted are all relative to the root of the checked out tree.
Update:
To get the list of the files that were present in the past but not in the current working directory you can run the following. Note that you'll have to do further effort to keep the "history before renaming" of renamed files:
$ git log --raw |awk '/^:/ { if (! printed[$6]) { print $6; printed[$6] = 1 }}'|while read f;do if [ ! -f $f ]; then echo Deleted: $f;fi;done
That $6 is the name of the file that were affected in a commit in shown in the --raw mode of log.
See the --diff-filter option to git log if you want know what happened ([D]eleted, [R]enamed, [M]odified, and so on) to each file for every commit.
Maybe others can chime in on how to find out the previous name of a tracked file in case of renames.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With