I have Maven dependencies spark-sql_2.1.0
and spark-hive_2.1.0
. However, when I am trying to import org.apache.spark.sql.DataFrame
, there is an error. But importing
org.apache.spark.sql.SQLContext
is OK, there is no errors. Why?
DataFrame has become a type DataFrame = Dataset[Row]
in Spark 2.x. Java doesn't have type aliases, so it's not available in Java. You should now use the new type Dataset<Row>
, so import both org.apache.spark.sql.Dataset
and org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
works for scala
and not for java
as there is no library developed for java
. You can use dataSet
as explained in Spark SQL, DataFrames and Datasets Guide
You can import the following
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
and use them as
Dataset<Row> peopleDataFrame = spark.createDataFrame(rowRDD, schema);
Or
Dataset<Row> peopleDF = spark.createDataFrame(peopleRDD, Person.class);
Or
Dataset<Row> usersDF = spark.read().load("examples/src/main/resources/users.parquet");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With