Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filtering of files and paths from gitignore

Tags:

git

c#

gitignore

I would like to find all file paths that are not filtered by a .gitignore (or any nested .gitignore files within sub-directories) using C#. This is similar to the question here with regard to PHP. I'm wondering if someone knows if this code had already been made available (in C#) somewhere online.

UPDATE: To answer what I want this for, it is so I can run my own little periodic backup of my source files for certain projects (zipping the result), for added peace of mind. The hard part is getting a robust .gitignore parser to get the filtered file paths (and exclude the others), without wanting to become too embroiled in learning that spec if someone else already has done it for me.

like image 690
Nicholas Petersen Avatar asked Sep 10 '15 15:09

Nicholas Petersen


3 Answers

Well, the best way to parse .gitignore files (and the other files Git uses, such as $GIT_DIR/info/exclude) is to get Git to do it for you. :-) (In your case, most cases in fact, this does involve executing a git subprocess.)

git check-ignore

The git check-ignore command can be used to detect which files are ignored and why. The --non-matching option makes it tell you about files that are not ignored as well, though since it still tells you about ignored files, too, and in a special format, you'll need to do a little bit of further work to get a simple list of non-ignored files. This Bourne shell function does the trick:

find_nonignored() {
    find . -path ./.git -prune -o -print \
        | git check-ignore --verbose --non-matching --stdin \
        | sed -n -e 's,\t./,\t,' -e 's,^::\t*,,p' \
}

How It Works

The find command finds all files in and below the current working directory, which should be somewhere in the tree you're trying to filter. We exclude the top-level .git subdirectory and everything under it from the output, if present; /.git/ is not in a typical .gitignore file because Git ignores it automatically and thus is is normally considered "not ignored" by git check-ignore.

git check-ignore will print out --non-matching files only in --verbose mode because it's only in that mode where it prints out the extra information that would tell you if the file is ignored or not. (It always prints ignored files.) The paths come out one per line in the format

source:linenum:pattern<TAB>path

The colon-separated fields are information about what caused the path to be ignored (such as a line in the .gitignore file) and will be empty if the file is not ignored.

The sed command then filters the output to show only the paths of the ignored files. The -n option tells it not to print out the input lines by default. The first substitution pattern replaces <TAB>./ with just <TAB>, removing the leading ./, for purely aesthetic reasons. The second substitution does the real work, removing any ::<TAB> (indicating no "ignore" information) that starts a line and, if that substitution happened, printing what's left of the line which is a non-ignored path.

You can filter this further to do additional processing; I built this for a script that does markdown checking along these lines:

markdownlint $(find_nonignored | grep '\.md$')

Notes

  1. This code includes untracked files (i.e., have never been added to the Git repo or staged) in the output, which is usually what you want. (Test systems, for example, should still check new files even before they've had git add run on them.) Beware that other solutions involving git ls-files and the like usually don't do this.

  2. The above code relies on using GNU sed, which interprets \t as a tab. If you're using BSD sed (such as on MacOS) you probably need to tweak this slightly. Check the comments to see if someone has a hint for this.

  3. All the code here breaks on paths with spaces or other "unusual" characters; it needs to be modified in several places (such as using -print0 with find) to fix this. I do not address issues like this here in order to keep the explanation simple. I also leave for others the generalization of the function to work on arbitrary paths rather than just the current working directory.

like image 136
cjs Avatar answered Sep 21 '22 18:09

cjs


It's difficult to make suggestions without knowing exactly what you want to do with the list (use it in a build script, process the files in some way, just view them on a UI, etc.)

I couldn't find one in C#, but this JavaScript gitignore parser doesn't have a lot of code to convert and it exposes both an accepts and a denies method to get a list of included or ignored files. It is fairly well documented, has tests, and the regular expressions it uses would work just as well in C# as they do in JavaScript.

This answer would work from C#, provided you have Git installed on the machine where your C# code is running.

Also note that the Git Source Control Provider plugin for Visual Studio provides the list right in the IDE, along with the ability to check boxes and commit certain files together and a lot of other functionality that is difficult to do on the command line.

NOTE: The Git Source Control Provider is open source (written in C#) and you can view the source here, but it may be much more involved to reverse engineer than the JavaScript project.

like image 33
NightOwl888 Avatar answered Sep 20 '22 18:09

NightOwl888


For those looking for a C# library, you can check this out as well.

.gitignore based parser implemented in C# according to the .gitignore spec 2.29.2. The library is tested against real git status outputs. The tests use LibGit2Sharp for that.

https://github.com/goelhardik/ignore

It's kind of a port of other open source libraries and so far looks like it works well for my other projects.

like image 44
Hardik Goel Avatar answered Sep 22 '22 18:09

Hardik Goel