I have written a spark job on my local machine which reads the file from google cloud storage using google hadoop connector like gs://storage.googleapis.com/ as mentioned in https://cloud.google.com/dataproc/docs/connectors/cloud-storage
I have set up service account with compute engine and storage permissions. My spark configuration and code is
SparkConf conf = new SparkConf();
conf.setAppName("SparkAPp").setMaster("local");
conf.set("google.cloud.auth.service.account.enable", "true");
conf.set("google.cloud.auth.service.account.email", "[email protected]");
conf.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
conf.set("fs.gs.project.id", "xxx-990711");
conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
SparkContext sparkContext = new SparkContext(conf);
JavaRDD<String> data = sparkContext.textFile("gs://storage.googleapis.com/xxx/xxx.txt", 0).toJavaRDD();
data.foreach(line -> System.out.println(line));
I have set up environment variable also named GOOGLE_APPLICATION_CREDENTIALS which points to the key file. I have tried using both key files i.e. json & P12. But unable to access the file. The error which I get is
java.net.UnknownHostException: metadata
java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:208)
at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:70)
I am running my job from eclipse with java 8, spark 2.2.0 dependencies and gcs-connector 1.6.1.hadoop2 . I need to connect only using service account and not by OAuth mechanism.
Thanks in advance
Config Connector is an open source Kubernetes addon that allows you to manage Google Cloud resources through Kubernetes. Many cloud-native development teams work with a mix of configuration systems, APIs, and tools to manage their infrastructure.
Are you trying it locally? If yes then you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS
to your key.json
or set it to HadoopConfiguration
instead of setting it to SparkConf
like:
Configuration hadoopConfiguration = sparkContext.hadoopConfiguration();
hadoopConfiguration.set("google.cloud.auth.service.account.enable", true);
hadoopConfiguration.set("google.cloud.auth.service.account.email", "[email protected]");
hadoopConfiguration.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With