I am trying to access Kafka and 3rd-party services (e.g., InfluxDB) running in GKE, from a Dataflow pipeline.
I have a DNS server for service discovery, also running in GKE. I also have a route in my network to access the GKE IP range from Dataflow instances, and this is working fine. I can manually nslookup
from the Dataflow instances using my custom server without issues.
However, I cannot find a proper way to set up an additional DNS server when running my Dataflow pipeline. How could I achieve that, so that KafkaIO
and similar sources/writers can resolve hostnames against my custom DNS?
sun.net.spi.nameservice.nameservers
is tricky to use, because it must be called very early on, before the name service is statically instantiated. I would call java -D
, but Dataflow is going to run the code itself directly.
In addition, I would not want to just replace the systems resolvers but merely append a new one to the GCP project-specific resolvers that the instance comes pre-configured with.
Finally, I have not found any way to use a startup script like for a regular GCE instance with the Dataflow instances.
I can't think of a way today of specifying a custom DNS in a VM other than editing /etc/resolv.conf[1] file in the box. I don't know if it is possible to share the default network. If it is machines are available at hostName.c.[PROJECT_ID].internal, which may serve your purpose if hostName is stable [2].
[1] https://cloud.google.com/compute/docs/networking#internal_dns_and_resolvconf [2] https://cloud.google.com/compute/docs/networking
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With