Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load java properties file and use in Spark?

Tags:

I want to store the Spark arguments such as input file, output file into a Java property files and pass that file into Spark Driver. I'm using spark-submit for submitting the job but couldn't find a parameter to pass the properties file. Have you got any suggestions?

like image 798
diplomaticguru Avatar asked Jun 29 '15 12:06

diplomaticguru


People also ask

How do you load the data from the properties file?

The Properties file can be used in Java to externalize the configuration and to store the key-value pairs. The Properties. load() method of Properties class is convenient to load . properties file in the form of key-value pairs.

What is properties file in Spark?

Spark properties are the means of tuning the execution environment for your Spark applications. The default Spark properties file is $SPARK_HOME/conf/spark-defaults. conf that could be overriden using spark-submit 's --properties-file command-line option.

How does Java properties file work?

Properties is a file extension for files mainly used in Java related technologies to store the configurable parameters of an application. Java Properties files are amazing resources to add information in Java. Generally, these files are used to store static information in key and value pair.


2 Answers

here i found one solution:

props file : (mypropsfile.conf) // note: prefix your key with "spark." else props will be ignored.

spark.myapp.input /input/path spark.myapp.output /output/path 

launch

$SPARK_HOME/bin/spark-submit --properties-file  mypropsfile.conf 

how to call in code :( inside code)

sc.getConf.get("spark.driver.host")  // localhost sc.getConf.get("spark.myapp.input")       // /input/path sc.getConf.get("spark.myapp.output")      // /output/path 
like image 141
vijay kumar Avatar answered Sep 29 '22 22:09

vijay kumar


The previous answer's approach has the restriction that is every property should start with spark in property file-

e.g.

spark.myapp.input
spark.myapp.output

If suppose you have a property which doesn't start with spark:

job.property:

app.name=xyz

$SPARK_HOME/bin/spark-submit --properties-file  job.property 

Spark will ignore all properties doesn't have prefix spark. with message:

Warning: Ignoring non-spark config property: app.name=test

How I manage property file in application's driver and executor:

${SPARK_HOME}/bin/spark-submit --files job.properties 

Java code to access the cache file (job.properties):

import java.util.Properties; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.spark.SparkFiles; import java.io.InputStream; import java.io.FileInputStream;  //Load file to propert object using HDFS FileSystem String fileName = SparkFiles.get("job.properties") Configuration hdfsConf = new Configuration(); FileSystem fs = FileSystem.get(hdfsConf);  //THe file name contains absolute path of file FSDataInputStream is = fs.open(new Path(fileName));  // Or use java IO InputStream is = new FileInputStream("/res/example.xls");  Properties prop = new Properties(); //load properties prop.load(is) //retrieve properties prop.getProperty("app.name"); 

If you have environment specific properties (dev/test/prod) then supply APP_ENV custom java environment variable in spark-submit:

${SPARK_HOME}/bin/spark-submit --conf \ "spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" \ --properties-file  dev.property 

Replace your driver or executor code:

//Load file to propert object using HDFS FileSystem String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties") 
like image 23
Rahul Sharma Avatar answered Sep 29 '22 22:09

Rahul Sharma