Problem in specifying the network in cloud dataflow

Tags:

I didn't configure the project and I get this error whenever I run my job 'The network default doesn't have rules that open TCP ports 1-65535 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: No firewall rules associated with your network.'

google_cloud_options = p_options.view_as(GoogleCloudOptions)
google_cloud_options.region = 'europe-west1'
google_cloud_options.project = 'my-project'
google_cloud_options.job_name = 'rim'
google_cloud_options.staging_location = 'gs://my-bucket/binaries'
google_cloud_options.temp_location = 'gs://my-bucket/temp'
p_options.view_as(StandardOptions).runner = 'DataflowRunner'
p_options.view_as(SetupOptions).save_main_session = True
p_options.view_as(StandardOptions).streaming = True
p_options.view_as(WorkerOptions).subnetwork = 'regions/europe-west1/subnetworks/test'
p = beam.Pipeline(options=p_options)

I tried to specify --network 'test' in the command line since it is not the default configuration enter image description here

820

asked Jul 30 '19 10:07

Rim

2 Answers

It looks like your default firewall rules were modified and dataflow detected this and prevented your job from launching. Could you verify your firewall rules were not modified in your project?. Please take a look at the documentation here. You will also find a command here to restore the firewall rules:

gcloud compute firewall-rules create [FIREWALL_RULE_NAME] \
    --network [NETWORK] \
    --action allow \
    --direction ingress \
    --target-tags dataflow \
    --source-tags dataflow \
    --priority 0 \
    --rules tcp:1-65535

Pick a name for the firewall, and provide a network name. Then pass in the network name with --network when you launch the dataflow job. If you have a network named 'default' dataflow will try to use that automatically, so you won't need to pass in --network. If you've deleted that network you may wish to recreate it.

181

answered Oct 11 '22 13:10

Alex Amato

As of now, till apache beam version 2.19.0. There is no provision from dataflow to set network tag for its VM. Instead while creating firewall rule, we should add a tag for dataflow.

gcloud compute firewall-rules create FIREWALL_RULE_NAME \
    --network NETWORK \
    --action allow \
    --direction DIRECTION \
    --target-tags dataflow \
    --source-tags dataflow \
    --priority 0 \
    --rules tcp:12345-12346

See this link for more details https://cloud.google.com/dataflow/docs/guides/routes-firewall

answered Oct 11 '22 11:10

Aditya

Related questions
                            
                                Skipping header rows - is it possible with Cloud DataFlow?
                            
                                apache_beam.transforms.util.Reshuffle() not available for GCP Dataflow
                            
                                Cancelling jobs without dataloss on DataFlow
                            
                                Access HTTP service running in GKE from Google Dataflow
                            
                                How to make a generic Protobuf Parser DoFn in python beam?
                            
                                Saving to elastic search from Google Dataflow streaming job
                            
                                worker_machine_type tag not working in Google Cloud Dataflow with python
                            
                                How to create a Dataflow pipeline from Pub/Sub to GCS in Python
                            
                                GCP Dataflow: System Lag for streaming from Pub/Sub IO
                            
                                Read a file from GCS in Apache Beam
                            
                                Maven conflict in Java app with google-cloud-core-grpc dependency
                            
                                Reading CSV header with Dataflow
                            
                                Buffer and flush Apache Beam streaming data
                            
                                Google Dataflow - Failed to import custom python modules
                            
                                Long lived state with Google Dataflow
                            
                                Missing object or bucket in path when running on Dataflow
                            
                                Apache Beam: Unable to find registrar for gs
                            
                                Connecting to Cloud SQL from Dataflow Job
                            
                                BigQueryIO.read().fromQuery performance slow
                            
                                Dataflow/apache beam - how to access current filename when passing in pattern?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Problem in specifying the network in cloud dataflow

Tags:

vpc

apache-beam

google-cloud-dataflow

Rim

People also ask

2 Answers

Alex Amato

Aditya

Recent Activity

Donate For Us