I am new to Pyspark and I am actually trying to build a flatmap out of a Pyspark RDD object. However, even if this function clearly exists for pyspark RDD class, according to the documentation, I can't manage to use it and get the following error :
AttributeError: 'RDD' object has no attribute 'flatmap'
I am calling the latter function in the following line :
my_rdd = my_rdd.flatmap(lambda r: (r[5].split('|')))
The imports are the followings :
from pyspark.sql import *
from pyspark.sql.functions import *
from pyspark.sql import SparkSession
from pyspark import SparkContext as sc
from pyspark import SparkFiles
spark = SparkSession.builder.getOrCreate()
Additionaly, some other functions, as my_rdd.count are working, which let me think that the SparkContext is correctly implemented.
Do you have any ideas about the reason why it could fail ?
my_rdd = my_rdd.flatMap(lambda r: (r[5].split('|')))
uppercase !!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With