Hadoop configuration in sparkR

Question

I have some problems to configure hadoop with sparkR in order to read/write data from amazon S3.
For example these are the commands that works in pyspark (to solve the same issue):

sc._jsc.hadoopConfiguration().set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "myaccesskey")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "mysecretaccesskey")
sc._jsc.hadoopConfiguration().set("fs.s3n.endpoint", "myentrypoint")

Could anybody help me to work this out?

Philipp Langer · Accepted Answer

A solution closer to what you are doing with PySpark can be achieved by using callJMethod (https://github.com/apache/spark/blob/master/R/pkg/R/backend.R#L31)

> hConf = SparkR:::callJMethod(sc, "hadoopConfiguration")
> SparkR:::callJMethod(hConf, "set", "a", "b")
NULL
> SparkR:::callJMethod(hConf, "get", "a")
[1] "b"

UPDATE:

hadoopConfiguration didn't work for me: conf worked though - presumably it's changed at some point.

besil · Answer

You can set

<property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>

in your core-site.xml (yarn configuration)

Hadoop configuration in sparkR

Tags:

r

amazon-s3

apache-spark

hadoop

sparkr

CVec

2 Answers

Philipp Langer

besil

Recent Activity

Donate For Us

Hadoop configuration in sparkR

Tags:

r

amazon-s3

apache-spark

hadoop

sparkr

CVec

2 Answers

Philipp Langer

besil

Related questions

Recent Activity

Donate For Us