I am running an spark cluster on google cloud and I upload a configuration file with each job. What is the path to a file that is uploaded with a submit command?
In the example below how can I read the file Configuration.properties
before the SparkContext has been initialized? I am using Scala.
gcloud dataproc jobs submit spark --cluster my-cluster --class MyJob --files config/Configuration.properties --jars my.jar
Click the Start button and then click Computer, click to open the location of the desired file, hold down the Shift key and right-click the file. Copy As Path: Click this option to paste the full file path into a document. Properties: Click this option to immediately view the full file path (location).
You can retrieve the saved file path in the uploader success event and assign it to custom attribute (data-file-name) value of the respective file list element to open the uploaded file. Click the respective file element to create a new request along with saved file path using http header.
To get current file's full path, you can use the os. path. abspath function. If you want only the directory path, you can call os.
Uploadpath allows the use of tokens (provided by the token module) to specify patterns of subfolders for uploaded file storage and hence depends on the token module. Tokens are small snippets of text in square brackets such as [nid] for node id.
Local path to a file distributed using SparkFiles
mechanism (--files
argument, SparkContext.addFile
) method can be obtained using SparkFiles.get
:
org.apache.spark.SparkFiles.get(fileName)
You can also get the path to the root directory using SparkFiles.getRootDirectory
:
org.apache.spark.SparkFiles.getRootDirectory
You can use these combined with standard IO utilities to read the files.
how can I read the file Configuration.properties before the SparkContext has been initialized?
SparkFiles
are distributed by the driver, cannot be accessed before context has been initialized, and to be distributed in the first place, have to be accessible from the driver node. So this part of the question solely depends what type of storage you'll use to expose the file to the driver node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With