Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract a subset of history from Git?

Update: I tried to simplify the real example here to get a clear explanation of my options, but that didn't really work. The linked examples below so far are too general to get even this simple example working.

I was able to do this type of thing with SVN all the time and got quite skilled at it. Now I'm finding it extremely difficult in Git and starting to believe that my history is basically too munged together to be able to pull it apart.

Real world problem: I around a dozen files that have been moved and renamed. Their history is intermixed with the history of hundreds of other files for which I want to completely remove the history.

In SVN, I would be able to use a sequence of dump/include-filter/exclude-filter/load to get the repository trimmed down and rarely I might need to rename paths manually in the dump file itself before loading.

Something like this and I would have been done:

SET Includes=trunk/src/Foo.aaa trunk/src/Foo.bbb trunk/src/Foo trunk/src/Bar
SET Excludes=trunk/src/Bar/Blah.aaa trunk/src/Foo/Blah.aaa

svnadmin dump FooSrc > Full.dump 2> Dump.log
svndumpfilter include %Includes% --skip-missing-merge-sources --renumber-revs --drop-empty-revs < Full.dump > Filter_1.dump 2> Filter_1.log
svndumpfilter exclude %Excludes% --skip-missing-merge-sources --renumber-revs --drop-empty-revs < Filter_1.dump > Filter_2.dump 2> Filter_2.log
svnadmin create FooDest
svnadmin load FooDest --ignore-uuid < Filter_2.dump > Load.log 2> Load_Errors.log

Does anyone have a good example of this that is more than just a trivial removal of a single file or export of a single subdirectory?

The simplest way I can define the set of files is with a list of 7 directory paths. Everything inside of those directories is needing to be kept and everything outside needs to be pruned from the history.


Simplified problem:

I have a Git repository which has a handful of files that I'd like to extract into its own repository. The problem is these files were created and modified throughout the history of the original repository, so I am having trouble figuring out how to cleanly extract them.

Here is a gist of what my history looks like (only with more commits and lots more to ignore). As you can see I obviously didn't plan to have these files later be cherry picked out of the history:

commit 4a09d3f977a8595d9e3f61766a5fd743e4265a56

M    src/Foo/Bar/FileToExtract2.foo
A    src/Foo/Bar/FileToExtract3.bar
D    src/Foo/AnotherFileToIgnore.txt

commit 05d26f23518083270cc45bf037ced29bec45e064

M    src/Foo/Blah/IgnoreThisOneToo.foo
M    src/Foo/AnotherFileToIgnore.txt

commit 343187228f4bd8e4427395453034c34ebd9a95f3

M    src/Foo/Bar/FileToExtract1.txt
M    src/Foo/AnotherFileToIgnore.txt

commit 46a0129104ac31291462f657292aab43f8883d8d

A    src/Foo/Bar/FileToExtract1.txt
A    src/Foo/Bar/FileToExtract2.foo
M    src/Foo/FileToIgnore.txt

commit 3fe6af56f0d8dc42fcb5b0bafee41bff534ba2cc

A    src/ReadMe.txt
A    src/IgnoreMe.foo
A    src/Foo/FileToIgnore.txt
A    src/Foo/Blah/IgnoreThisOneToo.foo
A    src/Foo/AnotherFileToIgnore.txt

In the end, what I want to have is a clean repository with the complete history of just the files in src/Foo/Bar/. The rest can be ignored. I'm also okay with keeping this repository as is (i.e. no history rewrite) and just committing a delete for that entire directory.

In SVN, I would use svnadmin dump, svndumpfilter, and svnadmin load. If I was careful, I could even manually edit the dump file to clean up paths, etc.

I've been looking through the Git commands and am unable to see a way of doing this. Any help would be greatly appreciated.

like image 588
mckamey Avatar asked Sep 08 '10 17:09

mckamey


People also ask

Can you pull a specific directory from git?

Just to add on this, the reason why you cannot pull just a directory is because git uses data semantic tracking, not file semantic tracking, so you can seamlessly move code (or other data) in and out of files without having to tell the source tracking system (until you update of course.)

Does git clone have history?

Each clone usually includes everything in a repository. That means when you clone, you get not only the files, but every revision of every file ever committed, plus the history of each commit.


2 Answers

You can use git filter-branch and detach the directory Foo in its own directory.
See:

  • "Detach subdirectory into separate Git repository".
  • "Howto extract a git subdirectory and make a submodule out of it?", which illustrate the inverse (remove everything but the files you want to keep).
like image 167
VonC Avatar answered Sep 28 '22 04:09

VonC


The equivalent of SVN's svnadmin dump, svndumpfilter and svnadmin load would be git fast-export, one own script (see examples) and git fast-import.

like image 23
Jakub Narębski Avatar answered Sep 28 '22 04:09

Jakub Narębski