Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load csv file into SparkR on RStudio?

How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write this. Not sure if this step is considered to create RDDs.

#Set sys environment variables
Sys.setenv(SPARK_HOME = "C:/Users/Desktop/spark/spark-1.4.1-bin-hadoop2.6")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

#Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')

#Load libraries
library(SparkR)
library(magrittr)

sc <- sparkR.init(master="local")
sc <- sparkR.init()
sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3")
sqlContext <- sparkRSQL.init(sc)

data <- read.df(sqlContext, "C:/Users/Desktop/DataSets/hello_world.csv", "com.databricks.spark.csv", header="true")

I am getting error:

Error in writeJobj(con, object) : invalid jobj 1
like image 669
sharp Avatar asked Sep 30 '15 18:09

sharp


People also ask

How do I import a CSV file into RStudio?

In RStudio, click on the Workspace tab, and then on “Import Dataset” -> “From text file”. A file browser will open up, locate the . csv file and click Open.

How do I read a csv file in SparkR?

read(). csv("file_name") to read a file or directory of files in CSV format into Spark DataFrame, and dataframe. write(). csv("path") to write to a CSV file.

How can you load and use csv file in R?

The CSV file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…) command in R. The CSV file can also be read from a URL using read. csv() function.


1 Answers

Spark 2.0.0+:

You can use csv data source:

loadDF(sqlContext, path="some_path", source="csv", header="true")

without loading spark-csv.

Original answer:

As far as I can tell you're using a wrong version of spark-csv. Pre-built versions of Spark are using Scala 2.10, but you're using Spark CSV for Scala 2.11. Try this instead:

sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.10:1.2.0")
like image 169
zero323 Avatar answered Sep 23 '22 19:09

zero323