Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to connect Google Storage file using GSC connector from Spark

I have written a spark job on my local machine which reads the file from google cloud storage using google hadoop connector like gs://storage.googleapis.com/ as mentioned in https://cloud.google.com/dataproc/docs/connectors/cloud-storage

I have set up service account with compute engine and storage permissions. My spark configuration and code is

SparkConf conf = new SparkConf();
conf.setAppName("SparkAPp").setMaster("local");
conf.set("google.cloud.auth.service.account.enable", "true");
conf.set("google.cloud.auth.service.account.email", "[email protected]");
conf.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
conf.set("fs.gs.project.id", "xxx-990711");
conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"); 

SparkContext sparkContext = new SparkContext(conf);
JavaRDD<String> data = sparkContext.textFile("gs://storage.googleapis.com/xxx/xxx.txt", 0).toJavaRDD();
data.foreach(line -> System.out.println(line));

I have set up environment variable also named GOOGLE_APPLICATION_CREDENTIALS which points to the key file. I have tried using both key files i.e. json & P12. But unable to access the file. The error which I get is

java.net.UnknownHostException: metadata
java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
        at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:208)
        at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:70)

I am running my job from eclipse with java 8, spark 2.2.0 dependencies and gcs-connector 1.6.1.hadoop2 . I need to connect only using service account and not by OAuth mechanism.

Thanks in advance

like image 210
Zebronix_777 Avatar asked Sep 25 '17 14:09

Zebronix_777


People also ask

What is Google Cloud Connector?

Config Connector is an open source Kubernetes addon that allows you to manage Google Cloud resources through Kubernetes. Many cloud-native development teams work with a mix of configuration systems, APIs, and tools to manage their infrastructure.


1 Answers

Are you trying it locally? If yes then you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS to your key.json or set it to HadoopConfiguration instead of setting it to SparkConf like:

    Configuration hadoopConfiguration = sparkContext.hadoopConfiguration();
    hadoopConfiguration.set("google.cloud.auth.service.account.enable", true);
    hadoopConfiguration.set("google.cloud.auth.service.account.email", "[email protected]");
    hadoopConfiguration.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
like image 81
Optional Avatar answered Oct 11 '22 19:10

Optional