Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I import org.apache.spark.sql.DataFrame

I have Maven dependencies spark-sql_2.1.0and spark-hive_2.1.0. However, when I am trying to import org.apache.spark.sql.DataFrame, there is an error. But importing org.apache.spark.sql.SQLContext is OK, there is no errors. Why?

like image 468
Jason Shu Avatar asked Dec 11 '22 10:12

Jason Shu


2 Answers

DataFrame has become a type DataFrame = Dataset[Row] in Spark 2.x. Java doesn't have type aliases, so it's not available in Java. You should now use the new type Dataset<Row>, so import both org.apache.spark.sql.Dataset and org.apache.spark.sql.Row

like image 56
T. Gawęda Avatar answered Dec 25 '22 06:12

T. Gawęda


 import org.apache.spark.sql.DataFrame

works for scala and not for java as there is no library developed for java. You can use dataSet as explained in Spark SQL, DataFrames and Datasets Guide

You can import the following

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

and use them as

Dataset<Row> peopleDataFrame = spark.createDataFrame(rowRDD, schema);

Or

Dataset<Row> peopleDF = spark.createDataFrame(peopleRDD, Person.class);

Or

Dataset<Row> usersDF = spark.read().load("examples/src/main/resources/users.parquet");
like image 38
Ramesh Maharjan Avatar answered Dec 25 '22 06:12

Ramesh Maharjan