Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark error "It appears that you are attempting to reference SparkContext from a broadcast "

Here are the functions in my class:

def labeling(self, value, labelMap, dtype='string'):
        if dtype.value == 'string':
            result = [i for v,i in labelMap.value if value==v][0]
            return result
        else:
            result = [i for v,i in labelMap.value if value<v][0]
            return result

def labelByValue(self, labelMap, dtype='string'):
        labeling = self.labeling
        labelMap = self.sc.broadcast(labelMap)
        dtype = self.sc.broadcast(dtype)
        self.RDD = self.RDD.map(labeling)

but when I call the function below in "main", it report error like:""It appears that you are attempting to reference SparkContext from a broadcast ""

class.RDD.labelByValue((('a', 1), ('b', 2), ('c', 3)))

I could not find anything by myself. So I came here for a help Thanks in advance.

like image 806
Weinrot Avatar asked Mar 16 '23 10:03

Weinrot


1 Answers

I finally finished this error.

The wrong point is that the user defined function should be put in global environment, not in the class.

So the labeling should be like this:

def labeling(value, labelMap, dtype='string'):
        if dtype.value == 'string':
            result = [i for v,i in labelMap.value if value==v][0]
            return result
        else:
            result = [i for v,i in labelMap.value if value<v][0]
            return result
like image 194
Weinrot Avatar answered May 01 '23 01:05

Weinrot