Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion

Question

Class ProdsTransformer:

    def __init__(self):  
      self.products_lookup_hmap = {}
      self.broadcast_products_lookup_map = None

    def create_broadcast_variables(self):
      self.broadcast_products_lookup_map = sc.broadcast(self.products_lookup_hmap)

    def create_lookup_maps(self):
    // The code here builds the hashmap that maps Prod_ID to another space.

pt = ProdsTransformer ()
pt.create_broadcast_variables()  

pairs = distinct_users_projected.map(lambda x: (x.user_id,    
                         pt.broadcast_products_lookup_map.value[x.Prod_ID]))

I get the following error:

"Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."

Any help with how to deal with the broadcast variables will be great!

user2303197 · Accepted Answer

By referencing the object containing your broadcast variable in your map lambda, Spark will attempt to serialize the whole object and ship it to workers. Since the object contains a reference to the SparkContext, you get the error. Instead of this:

pairs = distinct_users_projected.map(lambda x: (x.user_id, pt.broadcast_products_lookup_map.value[x.Prod_ID]))

Try this:

bcast = pt.broadcast_products_lookup_map
pairs = distinct_users_projected.map(lambda x: (x.user_id, bcast.value[x.Prod_ID]))

The latter avoids the reference to the object (pt) so that Spark only needs to ship the broadcast variable.

Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion

Tags:

python

apache-spark

pyspark

user3803714

1 Answers

user2303197

Recent Activity

Donate For Us

Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion

Tags:

python

apache-spark

pyspark

user3803714

1 Answers

user2303197

Related questions

Recent Activity

Donate For Us