Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function input() in pyspark

My problem here is when I enter the value of p, Nothing happens, It does not pursue execution: is there a way to fix it please?

import sys
from pyspark import SparkContext
sc = SparkContext("local", "simple App") 

p =input("Enter the word")
rdd1 = sc.textFile("monfichier") 
rdd2= rdd1.map(lambda l : l.split("\t")) 
rdd3=rdd2.map(lambda l: l[1])  
print rdd3.take(6)
rdd5=rdd3.filter(lambda l : p in l)

sc.stop()
like image 599
user7383383 Avatar asked Jan 08 '17 12:01

user7383383


2 Answers

You can use py4j to get input via Java

from py4j.java_gateway import JavaGateway

scanner = sc._gateway.jvm.java.util.Scanner  
sys_in = getattr(sc._gateway.jvm.java.lang.System, 'in')  
result = scanner(sys_in).nextLine()  
print(result)

Depending on your environment/spark version you might need to replace sc with spark.sparkContext

like image 177
deronwu Avatar answered Oct 22 '22 20:10

deronwu


You have to distinguish between to different cases:

  • Script submitted with $SPARK_HOME/bin/spark-submit script.py

    In this case you execute Scala application which in turn starts Python interpreter. Since Scala application doesn't expect any interaction from the standard input, not to mention passing it to Python interpreter, your Python script will simply hang waiting for data which won't come.

  • Script executed directly using Python interpreter (python script.py).

    You should be able to use input directly but at the cost of handling all the configuration details, normally handled by spark-submit / org.apache.spark.deploy.SparkSubmit, manually in your code.

In general all required arguments for your scripts can be passed using commandline

$SPARK_HOME/bin/spark-submit script.py some_app_arg another_app_arg

and accessed using standard methods like sys.argv or argparse and using input is neither necessary nor useful.

like image 22
zero323 Avatar answered Oct 22 '22 20:10

zero323