We know that in spark there is a method rdd.collect which converts RDD to a list.
List<String> f= rdd.collect(); String[] array = f.toArray(new String[f.size()]);
I am trying to do exactly opposite in my project. I have an ArrayList of String which I want to convert to JavaRDD. I am looking for this solution for quite some time but have not found the answer. Can anybody please help me out here?
parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession. Similar to PySpark, we can use SparkContext. parallelize function to create RDD; alternatively we can also use SparkContext. makeRDD function to convert list to RDD.
Dataset is a strong typed Dataframe, so both Dataset and Dataframe could use . rdd to convert to a RDD.
JavaPairRDD is there to declare the contract to the developer that a Key and Value is required. Regular JavaRDD can be used for operations which don't require an explicit Key field. These operations are generic operations on arbitrary element types.
You're looking for JavaSparkContext.parallelize(List)
and similar. This is just like in the Scala API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With