I have a simple spark program and I get the following error -
Error:-
ImportError: No module named add_num
Command used to run :-
./bin/spark-submit /Users/workflow/test_task.py
Code:-
from __future__ import print_function
from pyspark.sql import SparkSession
from add_num import add_two_nos
def map_func(x):
print(add_two_nos(5))
return x*x
def main():
spark = SparkSession\
.builder\
.appName("test-task")\
.master("local[*]")\
.getOrCreate()
rdd = spark.sparkContext.parallelize([1,2,3,4,5]) # parallelize into 2
rdd = rdd.map(map_func) # call the image_chunk_func
print(rdd.collect())
spark.stop()
if __name__ == "__main__":
main()
function code:-
def add_two_nos(x):
return x*x
You can specify the .py file form which you wish to import in the code itself by adding a statement sc.addPyFile(Path).
The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
Then use from add_num import add_two_nos
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With