I'm developing a git
post-receive
hook in Python. Data is supplied on stdin
with lines similar to
ef4d4037f8568e386629457d4d960915a85da2ae 61a4033ccf9159ae69f951f709d9c987d3c9f580 refs/heads/master
The first hash is the old-ref, the second the new-ref and the third column is the reference being updated.
I want to split this into 3 variables, whilst also validating input. How do I validate the branch name?
I am currently using the following regular expression
^([0-9a-f]{40}) ([0-9a-f]{40}) refs/heads/([0-9a-zA-Z]+)$
This doesn't accept all possible branch names, as set out by man git-check-ref-format. For example, it excludes a branch by the name of build-master
, which is valid.
I actually want to exclude any branch that starts with "build-". Can this be done in the same regex?
Given the great answers below, I wrote some tests, which can be found at https://github.com/alexchamberlain/githooks/blob/master/miscellaneous/git-branch-re-test.py.
Status: All the regexes below are failing to compile. This could indicate there's a problem with my script or incompatible syntaxes.
Command #1: git branch -r This Git command will show you remote branches. The -r flag here is short for --remotes . This is the command I use personally. So if you want, you can just stop reading here and use git branch -r whenever you want to list remote git branches.
The git branch command lets you create, list, rename, and delete branches. It doesn't let you switch between branches or put a forked history back together again. For this reason, git branch is tightly integrated with the git checkout and git merge commands.
Let's dissect the various rules and build regex parts from them:
They can include slash /
for hierarchical (directory) grouping, but no slash-separated component can begin with a dot .
or end with the sequence .lock
.
# must not contain /.
(?!.*/\.)
# must not end with .lock
(?<!\.lock)$
They must contain at least one /
. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the --allow-onelevel
option is used, this rule is waived.
.+/.+ # may get more precise later
They cannot have two consecutive dots ..
anywhere.
(?!.*\.\.)
They cannot have ASCII control characters (i.e. bytes whose values are lower than \040
, or \177 DEL
), space, tilde ~
, caret ^
, or colon :
anywhere.
[^\000-\037\177 ~^:]+ # pattern for allowed characters
They cannot have question-mark ?
, asterisk *
, or open bracket [
anywhere. See the --refspec-pattern
option below for an exception to this rule.
[^\000-\037\177 ~^:?*[]+ # new pattern for allowed characters
They cannot begin or end with a slash /
or contain multiple consecutive slashes (see the --normalize
option below for an exception to this rule)
^(?!/)
(?<!/)$
(?!.*//)
They cannot end with a dot .
.
(?<!\.)$
They cannot contain a sequence @{
.
(?!.*@\{)
They cannot contain a \
.
(?!.*\\)
Piecing it all together we arrive at the following monstrosity:
^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
And if you want to exclude those that start with build-
then just add another lookahead:
^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
This can be optimized a bit as well by conflating a few things that look for common patterns:
^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$
git check-ref-format <ref>
with subprocess.Popen
is a possibility:
import subprocess
process = subprocess.Popen(["git", "check-ref-format", ref])
exit_status = process.wait()
Advantages:
Disadvantages:
pygit2, which uses C bindings to libgit2, would be an even better possibility if check-ref-format
is exposed there, as it would be faster than Popen
, but I haven't found it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With