This question might seem like a duplicate of this.
I am trying to run Apache Beam python pipeline using flink on an offline instance of Kubernetes. However, since I have user code with external dependencies, I am using the Python SDK harness as an External Service - which is causing errors (described below).
The kubernetes manifest I use to launch the beam python SDK:
apiVersion: apps/v1
kind: Deployment
metadata:
name: beam-sdk
spec:
replicas: 1
selector:
matchLabels:
app: beam
component: python-beam-sdk
template:
metadata:
labels:
app: beam
component: python-beam-sdk
spec:
hostNetwork: True
containers:
- name: python-beam-sdk
image: apachebeam/python3.7_sdk:latest
imagePullPolicy: "Never"
command: ["/opt/apache/beam/boot", "--worker_pool"]
ports:
- containerPort: 50000
name: yay
apiVersion: v1
kind: Service
metadata:
name: beam-python-service
spec:
type: NodePort
ports:
- name: yay
port: 50000
targetPort: 50000
selector:
app: beam
component: python-beam-sdk
When I launch my pipeline with the following options:
beam_options = PipelineOptions([
"--runner=FlinkRunner",
"--flink_version=1.9",
"--flink_master=10.101.28.28:8081",
"--environment_type=EXTERNAL",
"--environment_config=10.97.176.105:50000",
"--setup_file=./setup.py"
])
I get the following error message (within the python sdk service):
NAME READY STATUS RESTARTS AGE
beam-sdk-666779599c-w65g5 1/1 Running 1 4d20h
flink-jobmanager-74d444cccf-m4g8k 1/1 Running 1 4d20h
flink-taskmanager-5487cc9bc9-fsbts 1/1 Running 2 4d20h
flink-taskmanager-5487cc9bc9-zmnv7 1/1 Running 2 4d20h
(base) [~]$ sudo kubectl logs -f beam-sdk-666779599c-w65g5
2020/02/26 07:56:44 Starting worker pool 1: python -m apache_beam.runners.worker.worker_pool_main --service_port=50000 --container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1', '--logging_endpoint=localhost:39283', '--artifact_endpoint=localhost:41533', '--provision_endpoint=localhost:42233', '--control_endpoint=localhost:44977']
2020/02/26 09:09:07 Initializing python harness: /opt/apache/beam/boot --id=1-1 --logging_endpoint=localhost:39283 --artifact_endpoint=localhost:41533 --provision_endpoint=localhost:42233 --control_endpoint=localhost:44977
2020/02/26 09:11:07 Failed to obtain provisioning information: failed to dial server at localhost:42233
caused by:
context deadline exceeded
I have no idea what the logging- or artifact endpoint (etc.) is. And by inspecting the source code it seems like that the endpoints has been hard-coded to be located at localhost.
(You said in a comment that the answer to the referenced post is valid, so I'll just address the specific error you ran into in case someone else hits it.)
Your understanding is correct; the logging, artifact, etc. endpoints are essentially hardcoded to use localhost. These endpoints are meant to be only used internally by Beam and are not configurable. So the Beam worker is implicitly assumed to be on the same host as the Flink task manager. Typically, this is accomplished by making the Beam worker pool a sidecar of the Flink task manager pod, rather than a separate service.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With