I have a submodule in my git repository and my directory structure is like,
app
-- folder1
-- folder2
-- submodule @5855
I have deployed my code on AWS by using autodeploy service. Now, on server I have code in the parent-directory but submodule directories are empty.
Q1) How can I get data in submodules. My repository on server is not git repository. Do I need to convert it firstly into git repo and then run submodule
commands to get it ?
Q2) How can I automate the submodule deployment as well?
Thanks
A git submodule is a record within a host git repository that points to a specific commit in another external repository. Submodules are very static and only track specific commits. Submodules do not track git refs or branches and are not automatically updated when the host repository is updated.
Git is a distributed, open-source version control system (VCS) that enables you to store code, track revision history, merge code changes, and revert to earlier code version when needed. Git Basics. Git stores your source code and its full development history locally in a repository.
Edit: Codebuild now has a "submodules" flag https://docs.aws.amazon.com/codebuild/latest/APIReference/API_GitSubmodulesConfig.html
Here's what worked for me
We're going to reinitialize the git repository and then trigger a submodule clone during the build phase of our deploy, essentially patching in support for submodules in codepipeline / codebuild
aws ssm put-parameter --name build_ssh_key --type String --value "$(cat id_rsa)"
ideally use SecureString instead of String but the guide I was following simply used string so I'm not sure if the commandline will require any extra paramsThen make your buildspec.yml look like the following:
version: 0.2
env:
parameter-store:
build_ssh_key: "build_ssh_key"
phases:
install:
commands:
- mkdir -p ~/.ssh
- echo "$build_ssh_key" > ~/.ssh/id_rsa
- chmod 600 ~/.ssh/id_rsa
- ssh-keygen -F github.com || ssh-keyscan github.com >>~/.ssh/known_hosts
- git config --global url."[email protected]:".insteadOf "https://github.com/"
- git init
- git remote add origin <Your Repo url here using the git protocol>
- git fetch
- git checkout -t origin/master
- git submodule init
- git submodule update --recursive
build:
commands:
- echo '...replace with real build commands...'
artifacts:
files:
- '**/*'
I ran into this issue myself and, thanks to the awesome suggestions by @matt-bucci I was able to come up with what seems like a robust solution.
My specific use-case is slightly different - I am using Lambda Layers to reduce lambda redundancy, but still need to include the layers as submodules in the Lambda function repositories so that CodeBuild can build and test PRs. I am also using CodePipeline to assist with continuous delivery - so I need a system that works with both CodePipeline and CodeBuild by itself
I created a new SSH key for use by a "machine user" following these instructions. I am using a machine user in this case so that a new ssh key doesn't need to be generated for every project, as well as for potential support of multiple private submodules
I stored the private key in the AWS Parameter Store as a SecureString. This doesn't actually change anything within CodeBuild, since it's smart enough to just know how to decrypt the key
I gave the "codebuild" role AWS managed property: AmazonSSMReadOnlyAccess - allowing CodeBuild to access the private key
I made my buildspec.yml file, using a bunch of the commands suggested by @matt-bucci, as well as some new ones
# This example buildspec will enable submodules for CodeBuild projects that are both
# triggered directly and via CodePipeline
#
# This buildspec is designed with help from Stack Overflow:
# https://stackoverflow.com/questions/42712542/how-to-auto-deploying-git-repositories-with-submodules-on-aws
version: 0.2 # Always use version 2
env:
variables:
# The remote origin that will be used if building through CodePipeline
remote_origin: "[email protected]:your/gitUri"
parameter-store:
# The SSH RSA Key used by our machine user
ssh_key: "ssh_key_name_goes_here"
phases:
install:
commands:
# Add the "machine user's" ssh key and activate it - this allows us to get private (sub) repositories
- mkdir -p ~/.ssh # Ensure the .ssh directory exists
- echo "$ssh_key" > ~/.ssh/ssh_key # Save the machine user's private key
- chmod 600 ~/.ssh/ssh_key # Adjust the private key permissions (avoids a critical error)
- eval "$(ssh-agent -s)" # Initialize the ssh agent
- ssh-add ~/.ssh/ssh_key # Add the machine user's key to the ssh "keychain"
# SSH Credentials have been set up. Check for a .git directory to determine if we need to set up our git package
- |
if [ ! -d ".git" ]; then
git init # Initialize Git
git remote add origin "$remote_origin" # Add the remote origin so we can fetch
git fetch # Get all the things
git checkout -f "$CODEBUILD_RESOLVED_SOURCE_VERSION" # Checkout the specific commit we are building
fi
# Now that setup is complete, get submodules
- git submodule init
- git submodule update --recursive
# Additional install steps... (npm install, etc)
build:
commands:
# Build commands...
artifacts:
files:
# Artifact Definitions...
This install script performs three discrete steps
It installs and enables the ssh private key used to access private repositories
It determines if there is a .git folder - if there isn't then the script will initialize git and checkout the exact commit that is being built. Note: According to the AWS docs, the $CODEBUILD_RESOLVED_SOURCE_VERSION
envar is not guranteed to be present in CodePipeline builds. However, I have not seen this fail
Finally, it actually gets the submodules
Obviously, this is not a great solution to this problem. However, it's the best I can come up with given the (unnecessary) limitations of CodePipeline. A side effect of this process is that the "Source" CodePipeline stage is completely worthless, since we just overwrite the archived source files - it's only used to listen for changes to the repository
Better functionality has been requested for over 2 years now: https://forums.aws.amazon.com/thread.jspa?threadID=248267
I realized (the hard way) that my previous response didn't support CodePipeline builds, only builds run through CodeBuild directly. When CodeBuild responds to a GitHub Webhook, it will clone the entire GitHub repository, including the .git folder
However, when using CodePipeline, the "Source" action will clone the repository, check out the appropriate branch, then artifact the raw files without the .git folder. This means that we do have to initialize the github repository to get access to submodules
After banging my head against this all day, I've found a simple solution (for Code Pipeline) that doesn't require any SSH key juggling in the buildspec. I am using Bitbucket but I would think this would work for other providers. I'm also cloning my submodule via https, I'm not sure if that's a requirement or not.
Configure your source to do a full clone of the repository. This will pass along the git metadata that you need.
Configure your build role to add a customer-managed UseConnection permission to give your build action access to the credentials you configured for your source. Documentation from AWS here: https://docs.aws.amazon.com/codepipeline/latest/userguide/troubleshooting.html#codebuild-role-connections
Set up your env to include git-credential-helper: yes and clone the submodule in your buildspec.yml:
And that's it! Submodule will be available for build, and without having to do a bunch of key configuration for every submodule you want to use.
Maybe a good addition to the documentation if this ends up being useful for people.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With