Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark rdd : 'RDD' object has no attribute 'flatmap'

I am new to Pyspark and I am actually trying to build a flatmap out of a Pyspark RDD object. However, even if this function clearly exists for pyspark RDD class, according to the documentation, I can't manage to use it and get the following error :

AttributeError: 'RDD' object has no attribute 'flatmap'

I am calling the latter function in the following line :

my_rdd = my_rdd.flatmap(lambda r: (r[5].split('|')))

The imports are the followings :

from pyspark.sql import *
from pyspark.sql.functions import *
from pyspark.sql import SparkSession
from pyspark import SparkContext as sc
from pyspark import SparkFiles
spark = SparkSession.builder.getOrCreate()

Additionaly, some other functions, as my_rdd.count are working, which let me think that the SparkContext is correctly implemented.

Do you have any ideas about the reason why it could fail ?

like image 843
Rémi Petitpierre Avatar asked Jan 27 '23 05:01

Rémi Petitpierre


1 Answers

my_rdd = my_rdd.flatMap(lambda r: (r[5].split('|')))

uppercase !!!

like image 80
thebluephantom Avatar answered Jan 29 '23 20:01

thebluephantom