Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a simple one line string to RDD in Spark

I have a simple line:

line = "Hello, world"

I would like to convert it to an RDD with only one element. I have tried

sc.parallelize(line)

But it get:

sc.parallelize(line).collect()
['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd']

Any ideas?

like image 862
poiuytrez Avatar asked Oct 02 '14 09:10

poiuytrez


People also ask

How can you create an RDD for a text file?

To create text file RDD, we can use SparkContext's textFile method. It takes URL of the file and read it as a collection of line. URL can be a local path on the machine or a hdfs://, s3n://, etc. The point to jot down is that the path of the local file system and worker node should be the same.

Can we convert dataset to RDD?

Dataset is a strong typed Dataframe, so both Dataset and Dataframe could use . rdd to convert to a RDD.


2 Answers

try using List as parameter:

sc.parallelize(List(line)).collect()

it returns

res1: Array[String] = Array(hello,world)
like image 99
michaeltang Avatar answered Oct 24 '22 17:10

michaeltang


The below code works fine in Python

sc.parallelize([line]).collect()
['Hello, world']

Here we are passing the parameter "line" as a list.

like image 1
Dhruv Avatar answered Oct 24 '22 18:10

Dhruv