Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I check for valid Git branch names?

I'm developing a git post-receive hook in Python. Data is supplied on stdin with lines similar to

ef4d4037f8568e386629457d4d960915a85da2ae 61a4033ccf9159ae69f951f709d9c987d3c9f580 refs/heads/master

The first hash is the old-ref, the second the new-ref and the third column is the reference being updated.

I want to split this into 3 variables, whilst also validating input. How do I validate the branch name?

I am currently using the following regular expression

^([0-9a-f]{40}) ([0-9a-f]{40}) refs/heads/([0-9a-zA-Z]+)$

This doesn't accept all possible branch names, as set out by man git-check-ref-format. For example, it excludes a branch by the name of build-master, which is valid.

Bonus marks

I actually want to exclude any branch that starts with "build-". Can this be done in the same regex?

Tests

Given the great answers below, I wrote some tests, which can be found at https://github.com/alexchamberlain/githooks/blob/master/miscellaneous/git-branch-re-test.py.

Status: All the regexes below are failing to compile. This could indicate there's a problem with my script or incompatible syntaxes.

like image 905
Alex Chamberlain Avatar asked Aug 23 '12 14:08

Alex Chamberlain


People also ask

Which command of git is used to see all available branches?

Command #1: git branch -r This Git command will show you remote branches. The -r flag here is short for --remotes . This is the command I use personally. So if you want, you can just stop reading here and use git branch -r whenever you want to list remote git branches.

What is git branch command?

The git branch command lets you create, list, rename, and delete branches. It doesn't let you switch between branches or put a forked history back together again. For this reason, git branch is tightly integrated with the git checkout and git merge commands.


2 Answers

Let's dissect the various rules and build regex parts from them:

  1. They can include slash / for hierarchical (directory) grouping, but no slash-separated component can begin with a dot . or end with the sequence .lock.

     # must not contain /.
     (?!.*/\.)
     # must not end with .lock
     (?<!\.lock)$
    
  2. They must contain at least one /. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the --allow-onelevel option is used, this rule is waived.

     .+/.+  # may get more precise later
    
  3. They cannot have two consecutive dots .. anywhere.

     (?!.*\.\.)
    
  4. They cannot have ASCII control characters (i.e. bytes whose values are lower than \040, or \177 DEL), space, tilde ~, caret ^, or colon : anywhere.

     [^\000-\037\177 ~^:]+   # pattern for allowed characters
    
  5. They cannot have question-mark ?, asterisk *, or open bracket [ anywhere. See the --refspec-pattern option below for an exception to this rule.

     [^\000-\037\177 ~^:?*[]+   # new pattern for allowed characters
    
  6. They cannot begin or end with a slash / or contain multiple consecutive slashes (see the --normalize option below for an exception to this rule)

     ^(?!/)
     (?<!/)$
     (?!.*//)
    
  7. They cannot end with a dot ..

     (?<!\.)$
    
  8. They cannot contain a sequence @{.

     (?!.*@\{)
    
  9. They cannot contain a \.

     (?!.*\\)
    

Piecing it all together we arrive at the following monstrosity:

^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$

And if you want to exclude those that start with build- then just add another lookahead:

^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$

This can be optimized a bit as well by conflating a few things that look for common patterns:

^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$
like image 178
Joey Avatar answered Oct 14 '22 00:10

Joey


git check-ref-format <ref> with subprocess.Popen is a possibility:

import subprocess
process = subprocess.Popen(["git", "check-ref-format", ref])
exit_status = process.wait()

Advantages:

  • if the algorithm ever changes, the check will update automatically
  • you are sure to get it right, which is way harder with a monster Regex

Disadvantages:

  • slower because subprocess. But premature optimization is the root of all evil.
  • requires Git as a binary dependency. But in the case of a hook it will always be there.

pygit2, which uses C bindings to libgit2, would be an even better possibility if check-ref-format is exposed there, as it would be faster than Popen, but I haven't found it.