I'm setting up a dataflow job and for this job the workers need access to a private bitbucket repository to install a library to process the data. In order to grant access to the dataflow workers, I have set up a pair of SSH keys (public & private). I managed to get the private key onto my dataflow worker. When trying to pip install the package via git+ssh I'm getting an error Host key verification failed
.
I have tried to look for the .ssh/known_hosts
file on the dataflow worker but this is not as straight forward then on a regular VM.
Alternatively, I have set it up myself via the following commands but this did not work as well:
mkdir -p ~/.ssh
chmod 0700 ~/.ssh
ssh-keyscan bitbucket.org > ~/.ssh/known_hosts
I still get the Host key verification failed
error.
An alternative suggested fix for this problem is to run ssh-keygen -R bitbucket.org
but then I get following error:
mkstemp: No such file or directory
For Dataflow Python SDK, you need to package your code with a setup.py
. All the commands to be executed upon worker start-up are written with subprocess.Popen
. The list of commands is as follows:
CUSTOM_COMMANDS = [
# decrypt key encrypted key in repository via gcloud kms
['gcloud', '-v'],
['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
'bitbucketpackages', '--key', 'package', '--plaintext-file',
'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
['chmod', '700', 'bb_package_key_decrypted'],
# install git & ssh
['apt-get', 'update'],
['apt-get', 'install', '-y', 'openssh-server'],
['apt-get', 'install', '-y', 'git'],
# add bitbucket.org as known host
['mkdir', '-p', '~/.ssh'],
['chmod', '0700', '~/.ssh'],
['ssh-keyscan', 'bitbucket.org', '>', '~/.ssh/known_hosts'],
# other attempts to fix it
# ['ssh-keygen', '-R', 'bitbucket.org']
# pip install
['sh', '-c', 'GIT_SSH_COMMAND="ssh -i ./bb_package_key_decrypted" pip install git+ssh://[email protected]/team/repo.git'],
]
ssh-keyscan is a command for gathering the public host keys for a number of hosts. It aids in building and verifying ssh_known_hosts files. ssh-keyscan provides a minimal interface suitable for use by shell and Perl scripts.
Open a terminal and use the ssh-keygen command with the -C flag to create a new SSH key pair. Replace the following: KEY_FILENAME : the name for your SSH key file. For example, a filename of my-ssh-key generates a private key file named my-ssh-key and a public key file named my-ssh-key.
Try updating ssh-keyscan
to write to some temp path and then passing the known hosts file location as a part of the GIT_SSH_COMMAND
. For example, I would update your script to be:
CUSTOM_COMMANDS = [
# decrypt key encrypted key in repository via gcloud kms
['gcloud', '-v'],
['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
'bitbucketpackages', '--key', 'package', '--plaintext-file',
'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
['chmod', '700', 'bb_package_key_decrypted'],
# install git & ssh
['apt-get', 'update'],
['apt-get', 'install', '-y', 'openssh-server'],
['apt-get', 'install', '-y', 'git'],
# add bitbucket.org as known host
['mkdir', '-p', '~/.ssh'],
['chmod', '0700', '~/.ssh'],
['ssh-keyscan', 'bitbucket.org', '>', '/tmp/bit_bucket_known_hosts'],
# other attempts to fix it
# ['ssh-keygen', '-R', 'bitbucket.org']
# pip install
['sh', '-c', 'GIT_SSH_COMMAND="ssh -o UserKnownHostsFile=/tmp/bit_bucket_known_hosts -i ./bb_package_key_decrypted" pip install git+ssh://[email protected]/team/repo.git'],
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With