Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git mirroring to GitHub and filtering private files

Currently working on a project. We want to open-source our day-to-day commits with full info (author, etc...) while filtering out some specific private folders.

Let's say I commit A/file1 and B/file2 in branch master, I would like to have it mirrored on GitHub with B folder filtered (this commit would only have A/file1).

One way I was thinking to do this is a remote update hook that:

  • List all new commits added by newref (let's say lastfoundcommit..newref)
  • Amend those commits one by one (from lastfoundcommit to newref) to remove unwanted files
  • In the process, it might create a local master-filtered branch (if it helps to have it locally)
  • Push this branch to public repository
  • Somehow keep a mapping of commit ID between private and public commits, to easily compute "lastfoundcommit" on next push

Ideally it could go both way (i.e., it would be nice if we could also import back github branches and pull requests and have them "rebased" on top of our private repository, either automatically or with a simple process -- probably not so hard as it is likely just a rebase).

This is somewhat similar to what git-subtree can do, except it is not to extract a subdir but to filter various files instead.

Does that seem feasible? Or any other suggestion? (maybe based on git filter-branch? or any other existing tool/script that might help me?)

Note: submodule is not a viable option, as they might be sparse and it would get in the way too much. Also, list of "private" files might extend/change over time.

like image 393
xen2 Avatar asked Mar 14 '26 20:03

xen2


1 Answers

I've been in a similar situation:

Don't use git filter branch

Use BFG repo cleaner instead. It's MUCH faster and easier to use.

You need to prune all commits in your repo regularly, to have no mention of your private folders/files.

Divide repositories

Have full repo AND OSS one. Let your scripts transfer files from one to other and have special tests on OSS one so that you CAN'T find there what you're NOT supposed to.

Both repositories should be within your network, preferably on same machine (security). Only allowed, checked and clean branch from OSS repo leaves the machine (push to GitHub or wherever).

Use submodules

Do a POC with submodules / subtrees to see if they won't work better than scripts. You'll waste additional 2 hours and know for sure. I do think they might be viable options since you guys have private folders...

Use project ignore and .git/info/exclude

You may use .git/info/exclude to exclude private files and folders in OSS repo, that way even info about them being purposefully avoided from Git attention doesn't leak away.

On your usual repo you may add those files to .gitignore, on one branch, that serves as initial filter and later is used to feed the OSS repo.

like image 167
LAFK says Reinstate Monica Avatar answered Mar 16 '26 12:03

LAFK says Reinstate Monica