I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are running on two different servers , thus I need a way to write to a remote hdfs from R .
Regards Rahul
Writing Spark table to hive using Sparklyr:
iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
sdf_copy_to(sc, iris_spark_table)
DBI::dbGetQuery(sc, "create table iris_hive as SELECT * FROM iris_spark_table")
As of latest sparklyr you can use spark_write_table
. pass in the format database.table_name
to specify a database
iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
spark_write_table(
iris_spark_table,
name = 'my_database.iris_hive ',
mode = 'overwrite'
)
Also see this SO post here where i got some input on more options
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With