I want to store the Spark arguments such as input file, output file into a Java property files and pass that file into Spark Driver. I'm using spark-submit for submitting the job but couldn't find a parameter to pass the properties file. Have you got any suggestions?
The Properties file can be used in Java to externalize the configuration and to store the key-value pairs. The Properties. load() method of Properties class is convenient to load . properties file in the form of key-value pairs.
Spark properties are the means of tuning the execution environment for your Spark applications. The default Spark properties file is $SPARK_HOME/conf/spark-defaults. conf that could be overriden using spark-submit 's --properties-file command-line option.
Properties is a file extension for files mainly used in Java related technologies to store the configurable parameters of an application. Java Properties files are amazing resources to add information in Java. Generally, these files are used to store static information in key and value pair.
here i found one solution:
props file : (mypropsfile.conf) // note: prefix your key with "spark." else props will be ignored.
spark.myapp.input /input/path spark.myapp.output /output/path
launch
$SPARK_HOME/bin/spark-submit --properties-file mypropsfile.conf
how to call in code :( inside code)
sc.getConf.get("spark.driver.host") // localhost sc.getConf.get("spark.myapp.input") // /input/path sc.getConf.get("spark.myapp.output") // /output/path
The previous answer's approach has the restriction that is every property should start with spark
in property file-
e.g.
spark.myapp.input
spark.myapp.output
If suppose you have a property which doesn't start with spark
:
job.property:
app.name=xyz
$SPARK_HOME/bin/spark-submit --properties-file job.property
Spark will ignore all properties doesn't have prefix spark.
with message:
Warning: Ignoring non-spark config property: app.name=test
How I manage property file in application's driver and executor:
${SPARK_HOME}/bin/spark-submit --files job.properties
Java code to access the cache file (job.properties):
import java.util.Properties; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.spark.SparkFiles; import java.io.InputStream; import java.io.FileInputStream; //Load file to propert object using HDFS FileSystem String fileName = SparkFiles.get("job.properties") Configuration hdfsConf = new Configuration(); FileSystem fs = FileSystem.get(hdfsConf); //THe file name contains absolute path of file FSDataInputStream is = fs.open(new Path(fileName)); // Or use java IO InputStream is = new FileInputStream("/res/example.xls"); Properties prop = new Properties(); //load properties prop.load(is) //retrieve properties prop.getProperty("app.name");
If you have environment specific properties (dev/test/prod)
then supply APP_ENV custom java environment variable in spark-submit
:
${SPARK_HOME}/bin/spark-submit --conf \ "spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" \ --properties-file dev.property
Replace your driver or executor code:
//Load file to propert object using HDFS FileSystem String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With