Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract one file with commit history from a git repo with index-filter & co

I have a Git repo converted from SVN to Mercurial to Git, and I wanted to extract just one source file. I also had weird characters like (an encoding mismatch corrupted Unicode ä) and spaces in the filenames.

How can I extract one file from a repository and place it at the root of the new repo?

like image 472
peterhil Avatar asked Sep 11 '11 00:09

peterhil


People also ask

Does git filter branch rewrite history?

DESCRIPTION. Lets you rewrite Git revision history by rewriting the branches mentioned in the <rev-list options>, applying custom filters on each revision. Those filters can modify each tree (e.g. removing a file or running a perl rewrite on all files) or information about each commit.

Where is git commit history stored?

Git stores the complete history of your files for a project in a special directory (a.k.a. a folder) called a repository, or repo. This repo is usually in a hidden folder called . git sitting next to your files.

How do I extract files from GitHub?

To download from GitHub, you should navigate to the top level of the project (SDN in this case) and then a green "Code" download button will be visible on the right. Choose the Download ZIP option from the Code pull-down menu. That ZIP file will contain the entire repository content, including the area you wanted.


5 Answers

A faster and easier-to-understand filter that accomplishes the same thing:

git filter-branch --index-filter '
                        git read-tree --empty
                        git reset $GIT_COMMIT -- $your $files $here
                ' \
        -- --all -- $your $files $here
like image 82
jthill Avatar answered Sep 24 '22 08:09

jthill


Seems it's not particularly easy, and that's the reason I'll be answering my own question despite many similar questions regarding git [index-filter|subdirectory-filter|filter-tree], as I needed to use all the previous to achieve this!

First a quick note, that even a spell like in a comment on Splitting a set of files within a git repo into their own repository, preserving relevant history

SPELL='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "trie.lisp" | tr "\n" "\0" | xargs -0 git rm --cached -r --ignore-unmatch'
git filter-branch --prune-empty --index-filter "$SPELL" -- --all

will not help with files named like imaging/DrinkkejaI<0300>$'\302\210'.txt_74x2032.gif. The aI<0300>$'\302\210' part once was a single letter: ä.

So in order to extract a single file, in addition to filter-branch I also needed to do:

git filter-branch -f --subdirectory-filter lisp/source/model HEAD

Alternatively, you can use --tree-filter: (the test is needed, because the file was at another directory earlier, see: How can I move a directory in a Git repo for all commits?)

MV_FILTER='test -f source/model/trie.lisp && mv ./source/model/trie.lisp . || echo "Nothing to do."'
git filter-branch --tree-filter $MV_FILTER HEAD --all

To see all the names a file have had, use:

git log --pretty=oneline --follow --name-only git-path/to/file | grep -v ' ' | sort -u

As described at http://whileimautomaton.net/2010/04/03012432

Also follow the steps on afterwards:

$ git reset --hard
$ git gc --aggressive
$ git prune
$ git remote rm origin # Otherwise changes will be pushed to where the repo was cloned from
like image 20
peterhil Avatar answered Sep 26 '22 08:09

peterhil


There is a new command git filter-repo nowadays. It has more possibilities and better performance.

See man page for details and project page for installation.

Remove everything except src/README.md and move it to the root:

git filter-repo --path src/README.md
git filter-repo --subdirectory-filter src/

--path selects the single file and --subdirectory-filter moves the contents of that directory to root.

like image 27
Roman Avatar answered Sep 25 '22 08:09

Roman


I've found an elegant solution using git log and git am here: https://www.pixelite.co.nz/article/extracting-file-folder-from-git-repository-with-full-git-history/

In case it goes away, here's how you do it:

  1. in the original repo,

    git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > /tmp/patch
    
  2. if the file was in a subdirectory, or if you want to rename it

    sed -i -e 's/deep\/path\/that\/you\/want\/shorter/short\/path/g' /tmp/patch
    
  3. in a new, empty repo

    git am < /tmp/patch
    
like image 21
Marius Gedminas Avatar answered Sep 27 '22 08:09

Marius Gedminas


The following will rewrite the history and keep only commits that touch the list of files you give. You probably want to do that in a clone of your repository to avoid losing the original history.

FILES='path/to/file1 other-path/to/file2 file3'
git filter-branch --prune-empty --index-filter "
                        git read-tree --empty
                        git reset \$GIT_COMMIT -- $FILES
                " \
        -- --all -- $FILES

Then you can merge that new branch into your target repository, via normal merge or rebase commands according to your use-case.

like image 40
PowerKiKi Avatar answered Sep 28 '22 08:09

PowerKiKi