I've been trying to find all the authors of a git project, so that I can ask about relicensing their commits. I figured there'd be no point in contacting all the authors, as there may have been some who have had code in the codebase, but it was removed. So I wanted to contact only the authors with commits which are visible in the current HEAD.
I was told that git log had this capability, but I couldn't find anything on it except for something like:
git log --format='%an <%ae>'
Which does sort of what I'd like to achieve except it doesn't exclude authors without code in the current codebase.
How can I achieve this?
The git blame command is used to examine the contents of a file line by line and see when each line was last modified and who the author of the modifications was.
The most basic and powerful tool to do this is the git log command. By default, with no arguments, git log lists the commits made in that repository in reverse chronological order; that is, the most recent commits show up first.
IANAL, but as for the relicensing I am not so sure that it is enough to have only the permission of the authors who have any code in the current project. After all their contributions / commits somehow lead to the current state of the project.
That aside you may want to take a look at git blame. It shows what line of a file was introduced in which commit by which author. This should get you closer to the solution of your problem. Maybe some additional post processing with awk ... | sort | uniq
can do the rest.
However, git blame
only shows information for a single file, so you would have to repeat that for all files in the repository.
In the root directory of the Git repository, you could use a shell command like this on Linux systems:
find ./ -name '*.cpp' -print0 | xargs -0 -i git blame --show-email {} | awk ' { print $3 } ' | sort | uniq
This searches for C++ source files (extension *.cpp) with find and performs a git blame
on all of those files. The option --show-email
of git blame
shows e-mail addresses instead of names, which are easier to filter for, because names can consist of several words, while an address is usually just one. awk
then gets only the third column of the output, which is the mail address. (First is the short commit hash, second one is the file name.) Finally, sort | uniq
is used to get rid of duplicates, showing each address only once.
(Untested, but it may point you in the right direction.)
If you just want every author who ever comitted anything to the repository, just use
git log --format='%an <%ae>' | sort | uniq
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With