We're facing issues during Dataflow jobs deployment.
We are using CustomCommands to install private repo on workers, but we face now an error in the worker-startup
logs of our jobs:
Running command: ['pip', 'install', 'git+ssh://[email protected]/[email protected]']
Command output: b'Traceback (most recent call last):
File "/usr/local/bin/pip", line 6, in <module>
from pip._internal import main\nModuleNotFoundError: No module named \'pip\'\n'
This code was working but since our last deploy of the service on Friday, it's not.
setup.py
with customCommands which are run during worker startup. (code example from official repo here)pip install git+ssh://[email protected]/[email protected]
(see commands below)CUSTOM_COMMANDS = [
# retrieve ssh key
["gsutil", "cp","gs://{bucket_name}/encrypted_python_repo_ssh_key".format(bucket_name=credentials_bucket), "encrypted_key"],
[
"gcloud",
"kms",
"decrypt",
"--location",
"global",
"--keyring",
project,
"--key",
project,
"--plaintext-file",
"decrypted_key",
"--ciphertext-file",
"encrypted_key",
],
["chmod", "700", "decrypted_key"],
# install git & ssh
["apt-get", "update"],
["apt-get", "install", "-y", "openssh-server"],
["apt-get", "install", "-y", "git"],
# Add ssh config which specify the location of the key & the host
[
"gsutil",
"cp",
"gs://{bucket_name}/ssh_config_gcloud".format(bucket_name=credentials_bucket),
"~/.ssh/config",
],
[
"pip",
"install",
"git+ssh://[email protected]/[email protected]",
],
]
apt-get --reinstall install -y python-setuptools python-wheel python-pip
(and other variations like curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python3 get-pip.py --force-reinstall
) in the CustomCommands but no specific improvement.FROM gcr.io/google-appengine/python
RUN apt-get update && apt-get install -y openssh-server
RUN virtualenv /env -p python3.7
# Setting these environment variables are the same as running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
# Set credentials for git run pip to install all
# dependencies into the virtualenv.
... specify SSH KEY, host, to allow private git repo pull
# Add the application source code.
ADD . /app
RUN pip install -r /app/requirements.txt && python /app/setup.py install && python /app/setup.py build
CMD gunicorn -b :$PORT main:app
Any idea about how to solve this issue, or any workaround available ?
Thanks for your help !
This seems mostly due to local state of the machine, or our computers.
After running some commands like python setup.py install
or python setup.py build
, I'm now unable to deploy jobs anymore (facing the same error during worker-startup
as deployed by the service), but my colleague is still able to deploy jobs (same code, same branch, except excluded directories from .gitignore like build
, dist
, ...) which are running. In his case, CustomCommands are not run on job deployment (but workers are still able to use local packaged pipeline).
Any way to specify a compiled package to use by worker ? I was not able to find doc on that...
As we were not able to pull private code from dataflow worker, we used the following workaround:
python setup.py sdist bdist_wheel
lib/my-package-1.0.0-py3-none-any.whl
pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).setup_file = "./setup.py"
pipeline_options.view_as(SetupOptions).extra_packages = ["./lib/my-package-1.0.0-py3-none-any.whl"]
If you use setup.py , you have to visit the library's website, figure out where to download it, extract the file, run setup.py ... In contrast, pip will automatically search the Python Package Index (PyPi) to see if the package exists there, and will automatically download, extract, and install the package for you.
Installing Python Packages with Setup.py To install a package that includes a setup.py file, open a command or terminal window and: cd into the root directory where setup.py is located. Enter: python setup.py install.
As a first step, pip needs to get metadata about a package (name, version, dependencies, and more). It collects this by calling setup.py egg_info . The egg_info command generates the metadata for the package, which pip can then consume and proceed to gather all the dependencies of the package.
For anything but non-trivial, public dependencies I would recommend using custom containers and installing all the dependencies ahead of time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With