Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert List to JavaRDD

Tags:

apache-spark

We know that in spark there is a method rdd.collect which converts RDD to a list.

List<String> f= rdd.collect(); String[] array = f.toArray(new String[f.size()]); 

I am trying to do exactly opposite in my project. I have an ArrayList of String which I want to convert to JavaRDD. I am looking for this solution for quite some time but have not found the answer. Can anybody please help me out here?

like image 539
Amitabh Ranjan Avatar asked Jul 25 '14 09:07

Amitabh Ranjan


People also ask

Which method is used to convert the list to RDD?

parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession. Similar to PySpark, we can use SparkContext. parallelize function to create RDD; alternatively we can also use SparkContext. makeRDD function to convert list to RDD.

Can we convert dataset to RDD?

Dataset is a strong typed Dataframe, so both Dataset and Dataframe could use . rdd to convert to a RDD.

What is JavaPairRDD?

JavaPairRDD is there to declare the contract to the developer that a Key and Value is required. Regular JavaRDD can be used for operations which don't require an explicit Key field. These operations are generic operations on arbitrary element types.


1 Answers

You're looking for JavaSparkContext.parallelize(List) and similar. This is just like in the Scala API.

like image 51
Sean Owen Avatar answered Oct 17 '22 07:10

Sean Owen