Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Dataflow "No filesystem found for scheme gs"

I'm trying to execute a Google Dataflow Application, but it is throw this Exception

java.lang.IllegalArgumentException: No filesystem found for scheme gs
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:459)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:529)
    at org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:213)
    at org.apache.beam.sdk.io.TextIO$TypedWrite.to(TextIO.java:700)
    at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:1028)
    at br.com.sulamerica.mecsas.ExportacaoDadosPipeline.main(ExportacaoDadosPipeline.java:52)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
    at java.lang.Thread.run(Thread.java:748)

This is a slice of my Pipeline code

Pipeline.create()
        .apply(PubsubIO.readStrings().fromSubscription(subscription))
        .apply(new KeyExportacaoDadosToEntityTransform())
        .apply(new ListKeyEmpresaSelecionadasTransform())
        .apply(ParDo.of(new DoFn<List<Entity>, String>() {
            @ProcessElement
            public void processElement(ProcessContext c){
                c.output(
                    c.element().stream()
                        .map(e-> e.getString("dscRazaoSocial"))
                        .collect(Collectors.joining("\r\n"))
                );
            }
        }))
        .apply(TextIO.write().to("gs://<my bucket>"))
        .getPipeline()
    .run();

And this is the command used to execute my pipeline

mvn -Pdataflow-runner compile exec:java \
  -Dexec.mainClass=br.com.xpto.foo.ExportacaoDadosPipeline \
  -Dexec.args="--project=<projectID>\
  --stagingLocation=gs://dataflow-xpto/exportacao/staging \
  --output=gs://dataflow-xpto/exportacao/output \
  --runner=DataflowRunner"  
like image 846
William Miranda de Jesus Avatar asked Dec 13 '18 11:12

William Miranda de Jesus


2 Answers

I was grappling the same issue. So if you are using Maven to build the executable jar your shade plugin should look like this;

                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            <!-- add Main-Class to manifest file -->
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.main.Application</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
like image 129
Rana Avatar answered Oct 19 '22 09:10

Rana


I recently ran into this issue while working on Apache beam Java pipeline using Gradle.

Apply gradle shade plugin 'com.github.johnrengelman.shadow' to resolve this issue.

Pasting my build.gradle file here for future reference -

buildscript {
    repositories {
        maven {
           url "https://plugins.gradle.org/m2/"
        }
        jcenter()
    }
    dependencies {
        classpath 'com.github.jengelman.gradle.plugins:shadow:5.1.0'
    }
}


plugins {
    id 'java'
    id 'com.github.johnrengelman.shadow' version '5.1.0'
}


sourceCompatibility = 1.8


apply plugin: 'java'
apply plugin: 'com.github.johnrengelman.shadow'

repositories {
    mavenLocal()
    mavenCentral()
    jcenter()
    ivy {
        url 'http://dl.bintray.com/content/johnrengelman/gradle-plugins'
    }
}

dependencies {
// your dependencies here
}

jar {
    manifest {
        attributes "Main-Class": "your_main_class_wth_package"
    }

    from {
        configurations.compile.collect { it.isDirectory() ? it : zipTree(it) }
    }
}

You should see task shadowJar under shadow option in IntelliJ build. Enjoy!

like image 1
Onkar Avatar answered Oct 19 '22 10:10

Onkar