I am running my google dataflow job in Google Cloud Platform(GCP). When I run this job locally it worked well, but when running it on GCP, I got this error "java.lang.IllegalArgumentException: No filesystem found for scheme gs". I have access to that google cloud URI, I can upload my jar file to that URI and I can see some temporary file for my local job.
My Job id in GCP:
2019-08-08_21_47_27-162804342585245230 (beam version:2.12.0)
2019-08-09_16_41_15-11728697820819900062 (beam version:2.14.0)
I have tried beam version of 2.12.0 and 2.14.0, both of them have the same error.
java.lang.IllegalArgumentException: No filesystem found for scheme gs
at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:689)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:284)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:206)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:190)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:169)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This may be caused by a couple of issues if you build a "fat jar" that bundles all of your dependencies.
org.apache.beam:google-cloud-platform-core
to have the Beam GCS filesystem.META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
file with a line org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
. You can find this file in the jar from step 1. You will probably have many files with the same name in your dependencies, registering different Beam filesystems. You need to configure maven or gradle to combine these as part of your build or they will overwrite each other and not work properly.There is also one more reason for this exception.
Make sure you create pipeline (e.g. Pipeline.create(options)
) before you try to access files.
[GOLANG] In my case it was solved by applying the below imports for side-effects
import (
_ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs"
_ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local"
_ "github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/memfs"
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With