I have a simple line: <pre class="prettyprint"><code>line = "Hello, world" </code></pre> I would like to convert it to an RDD with only one element. I have tried <pre class="prettyprint"><code>sc.parallelize(line) </code></pre> But it get: <pre class="prettyprint"><code>sc.parallelize(line).collect() ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd'] </code></pre> Any ideas?

try using List as parameter: <pre class="prettyprint"><code>sc.parallelize(List(line)).collect() </code></pre> it returns <pre class="prettyprint"><code>res1: Array[String] = Array(hello,world) </code></pre>

The below code works fine in Python <pre class="prettyprint"><code>sc.parallelize([line]).collect() ['Hello, world'] </code></pre> Here we are passing the parameter "line" as a list.

Convert a simple one line string to RDD in Spark

Tags:

python

distributed-computing

apache-spark

rdd

pyspark

I have a simple line:

line = "Hello, world"

I would like to convert it to an RDD with only one element. I have tried

sc.parallelize(line)

But it get:

sc.parallelize(line).collect()
['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd']

Any ideas?

862

asked Oct 02 '14 09:10

poiuytrez

2 Answers

try using List as parameter:

sc.parallelize(List(line)).collect()

it returns

res1: Array[String] = Array(hello,world)

answered Oct 24 '22 17:10

michaeltang

The below code works fine in Python

sc.parallelize([line]).collect()
['Hello, world']

Here we are passing the parameter "line" as a list.

answered Oct 24 '22 18:10

Dhruv

Related questions
                            
                                Issue building cx_Oracle - libclntsh.so.11.1 => not found
                            
                                How to run python macros in LibreOffice?
                            
                                Tensorflow and Multiprocessing: Passing Sessions
                            
                                How can I use matplotlib.pyplot in a docker container?
                            
                                Splitting a string into an iterator
                            
                                Why does Django call it "views.py" instead of controller? [duplicate]
                            
                                python mysqldb multiple cursors for one connection
                            
                                Python: sharing common code among a family of scripts
                            
                                How to include external library with python wheel package
                            
                                Can Anaconda be packaged for a portable zero-configuration install?
                            
                                How can I get Selenium Web Driver to wait for an element to be accessible, not just present?
                            
                                Interrupt (pause) running Python program in pdb?
                            
                                Python center string using format specifier
                            
                                How do I unit testing my GUI program with Python and PyQt?
                            
                                Read a large csv into a sparse pandas dataframe in a memory efficient way
                            
                                "MetaClass", "__new__", "cls" and "super" - what is the mechanism exactly?
                            
                                How do I uninstall a Python module (“egg”) that I installed with easy_install?
                            
                                Is there an "enhanced" numpy/scipy dot method?
                            
                                Is it possible to add a where clause with list comprehension?
                            
                                How to use flake8 for Python 3 ?