Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all binary files in git HEAD

Tags:

I have a huge git repo that eventually want to clean up with bfg.
But first, I want to track down and remove files in the HEAD which git treats as binary...

So, what i'm looking for is a command to find all files in the HEAD that git treats as binary.

These didn't help:

  • List all text (non-binary) files in repo < I am looking for binary files. not text files.
  • Git find all binary files in history < I only care about the HEAD
  • http://git.661346.n2.nabble.com/git-list-binary-and-or-non-binary-files-td3506370.html < I tried those commands and they don't help.

Thank you in advance for your help.

like image 526
fabien Avatar asked Jun 07 '15 02:06

fabien


People also ask

How do I see all files in git?

Use the terminal to display the . git directory with the command ls -a . The ls command lists the current directory contents and by default will not show hidden files.

Does git track binary files?

Git cannot diff binary files. It will upload entire file into repository and will store it pretty much forever. It will also store every single version of every single binary file within the repository.

How do I find a file in a git repository?

While browsing your Git repository, start typing in the path control box to search for the file or folder you are looking for.

What is ls in git?

git ls-files --unmerged and git ls-files --stage can be used to examine detailed information on unmerged paths. For an unmerged path, instead of recording a single mode/SHA-1 pair, the index records up to three such pairs; one from tree O in stage 1, A in stage 2, and B in stage 3.


2 Answers

diff <(git grep -Ic '') <(git grep -c '') | grep '^>' | cut -d : -f 1 | cut -d ' ' -f 2- 

Breaking it down:

  • git grep -c '' prints the names and line counts of each file in the repository. Adding the -I option makes the command ignore binary files.
  • diff <(cmd1) <(cmd2) uses process substitution to provide diff with named pipes through which the output of cmd1 and cmd2 are sent.
  • The grep and cut commands are used to extract the filenames from the output of diff.
like image 93
jangler Avatar answered Oct 31 '22 11:10

jangler


A simplified solution based on the answer of @jangler (https://stackoverflow.com/a/30690662/808101)

comm -13 <(git grep -Il '' | sort -u) <(git grep -al '' | sort -u) 

Explanation:

  1. git grep

    • -l Ask to only print the filename of file matching the pattern '' (which should match with every line of every file)
    • -I This option makes the command ignore binary files
    • -a This option force to process binary files as if they were text
  2. sort -u Sort the result of the grep, since comm only process sorted files

  3. comm -13 List the files that are unique to the 2nd list (the git grep list with all files including the binary ones)

like image 31
benjarobin Avatar answered Oct 31 '22 11:10

benjarobin