Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all except certain folders from git history

Tags:

git

I have a complex git repo from which I would like to delete ALL files and history except for two folders, let's say:

foo/a
bar/x/y

While git filter-branch --subdirectory-filter would let me select one folder, and make that the new root, it doesn't seem to give me any option for selecting two directories, and preserving their placement.

git filter-branch --tree-filter or --index-filter seem like it will let me iterate through every commit in history, where I can use git rm on an unwanted folder.

I cannot seem to find any working way to get these commands to just preserve the two folders I desire while clearing everything else.

Thanks!

like image 889
Elon Sharp Avatar asked Mar 16 '17 12:03

Elon Sharp


People also ask

How do I remove sensitive data from git history?

If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.

How do I remove a specific folder in git?

Deleting directory in Git To delete a directory from git repository, we can use the git command followed by rm command , -r flag and the name of your directory.

How do I delete multiple folders in git?

Command line Git repository delete Just run the rm command with the -f and -r switch to recursively remove the . git folder and all of the files and folders it contains.

What is recursive removal in git?

The "dry run" option is a safeguard that will execute the git rm command but not actually delete the files. Instead it will output which files it would have removed. The -r option is shorthand for 'recursive'. When operating in recursive mode git rm will remove a target directory and all the contents of that directory.


2 Answers

For files, I've done this with git fast-export. But I'm not sure that would work recurseively on directories. So I'd suggest using a combination of git fast-export and find.

git fast-export HEAD -- `find foo/a bar/x/y -type f` >../myfiles.fi

Then create a new repo, and import the streams.

 git init
 git fast-import <../myfiles.fi
like image 115
Roland Smith Avatar answered Sep 28 '22 17:09

Roland Smith


You are correct: a tree filter or an index filter would be the way to do this with git filter-branch.

The tree filter is easier, but much slower (easily 10 to 100 times slower). The way a tree filter works is that your supplied command is run in a temporary directory that contains all, and only, the files that were present in the original (now being copied) commit. Any files your command leaves behind, remain in the copied commit. Any files your command creates in the temporary directory, are also in the copied commit. (You may create or remove directories within the temporary directory with no effect either way, since Git stores only the files.) Hence, to remove everything except A and B, write a command that removes every file that is in something other than either A or B:

find . -name A -prune -o -name B -prune -o -print0 | xargs -0 rm

for instance.

The index filter is harder, but faster because Git does not have to copy all the files out to a file tree and then re-scan the file tree to build a new index, in order to copy the original commit. Instead, it provides only an index, which you can then manipulate with commands like git rm -rf --cached --ignore-unmatch for instance, or git update-index for the most general case. But, now the only tools you have are those in Git that manipulate the index. There is no fancy Unix find command.

You do, of course, have git ls-files, which reads out the current contents of the index. Hence you can write a program in whatever language you like (I would use Python first here, probably, others might start with Perl) that in essence does:

for (all files in the index)
    if (file name starts with 'A/' or 'B/')
        do nothing
    else
        add to removal list
invoke "git rm --cached" on paths in removal list

If you are willing to trust that no file name has an embedded newline, the above can be done in regular shell as:

git ls-files | IFS=$'\n' while read path; do
    case "$path" in A/*|B/*) continue;; esac
    git rm --cached "$path"
done

which is not terribly efficient (one git rm --cached per path!) but should work "out of the box" as an --index-filter.

(Untested, but probably works and should be significantly more efficient: pipe git ls-files output through grep -v to remove desired files, and pipe grep output into git update-index --force-remove --stdin. This still assumes no newlines in path names.)

like image 29
torek Avatar answered Sep 28 '22 18:09

torek