Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent large text file from being added to commit when using GitHub

We want to prevent:

  • Very large text files (> 50MB per file) from being committed to git instead of git-lfs, as they inflate git history.
  • Problem is, 99% of them are < 1MB, and should be committed for better diffing.
  • The reason of variance in size: these are YAML files, they support binary serialization via base64 encoding.
  • The reason we can't reliably prevent binary serialization: this is a Unity project, binary serialization is needed for various reasons.

Given:

  • GitHub hosting's lack of pre-receive hook support.
  • git-lfs lack of file size attribute support.

Questions:

  1. How can we reliably prevent large files from being added to commit?
  2. Can this be done through a config file in repo so all users follow this rule gracefully?
  3. If not, can this be done by bash command aliasing so trusted users can see a warning message when they accidentally git add a large file and it's not processed by git-lfs?

(Our environment is macOS. I have looked at many solutions and so far none satisfy our needs)

like image 598
bitinn Avatar asked Dec 09 '18 07:12

bitinn


People also ask

Can you add large files to GitHub?

GitHub does not allow us to upload files larger than 25 megabytes through the browser. If you try you may get an error as follows. Nevertheless, you can push larger files into your GitHub repository using the git bash terminal as follows, in just 8 steps. Download and install Git on your pc.

Can't push to GitHub because of large file?

You need to clean the Git history of your project locally, removing the unwanted big files from all of history, and then use only the 'cleaned' history going forward. The Git commit ids of the affected commits will change.


2 Answers

Alright, with helps from CodeWizard and this SO answer, I managed to create a good guide myself:

First, setup your repo core.hooksPath with:

git config core.hooksPath .githooks

Second, create this pre-commit file inside .githooks folder, so it can be tracked (gist link), then remember to give it execution permission with chmod +x.

#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit".

# Redirect output to stderr.
exec 1>&2

FILE_SIZE_LIMIT_KB=1024
CURRENT_DIR="$(pwd)"
COLOR='\033[01;33m'
NOCOLOR='\033[0m'
HAS_ERROR=""
COUNTER=0

# generate file extension filter from gitattributes for git-lfs tracked files
filter=$(cat .gitattributes | grep filter=lfs | awk '{printf "-e .%s$ ", $1}')

# before git commit, check non git-lfs tracked files to limit size
files=$(git diff --cached --name-only | sort | uniq | grep -v $filter)
while read -r file; do
    if [ "$file" = "" ]; then
        continue
    fi
    file_path=$CURRENT_DIR/$file
    file_size=$(ls -l "$file_path" | awk '{print $5}')
    file_size_kb=$((file_size / 1024))
    if [ "$file_size_kb" -ge "$FILE_SIZE_LIMIT_KB" ]; then
        echo "${COLOR}${file}${NOCOLOR} has size ${file_size_kb}KB, over commit limit ${FILE_SIZE_LIMIT_KB}KB."
        HAS_ERROR="YES"
        ((COUNTER++))
    fi
done <<< "$files"

# exit with error if any non-lfs tracked files are over file size limit
if [ "$HAS_ERROR" != "" ]; then
    echo "$COUNTER files are larger than permitted, please fix them before commit" >&2
    exit 1
fi

exit 0

Now, assuming you got both .gitattributes and git-lfs setup properly, this pre-commit hook will run when you try to git commit and make sure all staged files not tracked by git-lfs (as specified in your .gitattributes), will satisfy the specified file size limit.

Any new users of your repo will need to setup core.hooksPath themselves, but beyond that, things should just work.

Hope this helps other Unity developers fighting with growing git repo size!

like image 98
bitinn Avatar answered Oct 05 '22 00:10

bitinn


  • How can we reliably prevent large files from being added to commit?
  • Can this be done through a config file in the repo so all users follow this rule gracefully? Since GitHub doesn't support server-side hooks you can use client-side hooks. As you probably aware, those hooks can be passed and be disabled with no problem, but still, this is a good way to do it.

core.hooksPath

Git v2.9 added the ability to set the client hooks on remote folder. Prior to that, the hooks must have been placed inside the .git folder.

This will allow you to write scripts and put them anywhere. I assume you know what hooks are but if not feel free to ask.


How to do it?

Usually, you place the hooks inside your repo (or any other common folder).

# set the hooks path. for git config, the default location is --local
# so this configuration is locally per project
git config core.hooksPath .githooks
like image 22
CodeWizard Avatar answered Oct 05 '22 00:10

CodeWizard