Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop configuration in sparkR


I have some problems to configure hadoop with sparkR in order to read/write data from amazon S3.
For example these are the commands that works in pyspark (to solve the same issue):

sc._jsc.hadoopConfiguration().set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "myaccesskey")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "mysecretaccesskey")
sc._jsc.hadoopConfiguration().set("fs.s3n.endpoint", "myentrypoint")

Could anybody help me to work this out?

like image 742
CVec Avatar asked Jan 30 '26 02:01

CVec


2 Answers

A solution closer to what you are doing with PySpark can be achieved by using callJMethod (https://github.com/apache/spark/blob/master/R/pkg/R/backend.R#L31)

> hConf = SparkR:::callJMethod(sc, "hadoopConfiguration")
> SparkR:::callJMethod(hConf, "set", "a", "b")
NULL
> SparkR:::callJMethod(hConf, "get", "a")
[1] "b"

UPDATE:

hadoopConfiguration didn't work for me: conf worked though - presumably it's changed at some point.

like image 186
Philipp Langer Avatar answered Feb 01 '26 18:02

Philipp Langer


You can set

<property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>

in your core-site.xml (yarn configuration)

like image 41
besil Avatar answered Feb 01 '26 18:02

besil