Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking if an object is in a repo in gitpython

Tags:

gitpython

I'm working on a program that will be adding and updating files in a git repo. Since I can't be sure if a file that I am working with is currently in the repo, I need to check its existence - an action that seems to be harder than I thought it would be.

The 'in' comparison doesn't seem to work on non-root levels on trees in gitpython. Ex.

>>> repo = Repo(path)
>>> hct = repo.head.commit.tree
>>>> 'A' in hct['documents']
False
>>> hct['documents']['A']
<git.Tree "8c74cba527a814a3700a96d8b168715684013857">

So I'm left to wonder, how do people check that a given file is in a git tree before trying to work on it? Trying to access an object for a file that is not in the tree will throw a KeyError, so I can do try-catches. But that feels like a poor use of exception handling for a routine existence check.

Have I missed something really obvious? How does once check for the existence of a file in a commit tree using gitpython (or really any library/method in Python)?

Self Answer

OK, I dug around in the Tree class to see what __contains__ does. Turns out, when searching in sub folders, one has to check for existence of a file using the full relative path from the repo's root. So a working version of the check I did above is:

>>> 'documents/A' in hct['documents']
True
like image 561
Bill Bushey Avatar asked May 05 '12 22:05

Bill Bushey


People also ask

What is a bare repository?

A bare Git repository is typically used as a Remote Repository that is sharing a repository among several different people. You don't do work right inside the remote repository so there's no Working Tree (the files in your project that you edit), just bare repository data.

What is a Git blob?

A Git blob (binary large object) is the object type used to store the contents of each file in a repository. The file's SHA-1 hash is computed and stored in the blob object. These endpoints allow you to read and write blob objects to your Git database on GitHub.

What is repo Git?

A Git repository tracks and saves the history of all changes made to the files in a Git project. It saves this data in a directory called . git , also known as the repository folder. Git uses a version control system to track all changes made to the project and save them in the repository.

Can I run Git commands from python?

GitPython is a python library used to interact with git repositories. It is a module in python used to access our git repositories. It provides abstractions of git objects for easy access of repository data, and additionally allows you to access the git repository more directly using pure python implementation.


2 Answers

EricP's answer has a bug. Here's a fixed version:

def fileInRepo(repo, filePath):
    '''
    repo is a gitPython Repo object
    filePath is the full path to the file from the repository root
    returns true if file is found in the repo at the specified path, false otherwise
    '''
    pathdir = os.path.dirname(filePath)

    # Build up reference to desired repo path
    rsub = repo.head.commit.tree

    for path_element in pathdir.split(os.path.sep):

        # If dir on file path is not in repo, neither is file. 
        try : 
            rsub = rsub[path_element]

        except KeyError : 

            return False

    return(filePath in rsub)

Usage:

file_found = fileInRepo(repo, 'documents/A')

This is very similar to EricP's code, but handles the case where the folder containing the file is not in the repo. EricP's function raises a KeyError in that case. This function returns False.

(I offered to edit EricP's code but was rejected.)

like image 142
Lucidity Avatar answered Sep 22 '22 21:09

Lucidity


Expanding on Bill's solution, here is a function that determines whether a file is in a repo:

def fileInRepo(repo,path_to_file):
    '''
    repo is a gitPython Repo object
    path_to_file is the full path to the file from the repository root
    returns true if file is found in the repo at the specified path, false otherwise
    '''
    pathdir = os.path.dirname(path_to_file)

    # Build up reference to desired repo path
    rsub = repo.head.commit.tree
    for path_element in pathdir.split(os.path.sep):
        rsub = rsub[path_element]
    return(path_to_file in rsub)

Example usage:

file_found = fileInRepo(repo, 'documents/A')
like image 21
EricP Avatar answered Sep 20 '22 21:09

EricP