Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert an RDD to iterable: PySpark?

I have an RDD which I am creating by loading a text file and preprocessing it. I dont want to collect it and save it to the disk or memory(entire data) but rather want to pass it to some other function in python which consumes data one after the other is form of iterable.

How is this possible?

data =  sc.textFile('file.txt').map(lambda x: some_func(x))

an_iterable = data. ##  what should I do here to make it give me one element at a time?
def model1(an_iterable):
 for i in an_iterable:
  do_that(i)

model(an_iterable)
like image 520
pg2455 Avatar asked Sep 24 '15 22:09

pg2455


1 Answers

I believe what you want is toLocalIterator():

like image 155
danf1024 Avatar answered Oct 17 '22 06:10

danf1024