I wrote a Spark job in Java. The job is packaged as a shaded jar and executed: <pre class="prettyprint"><code>spark-submit my-jar.jar </code></pre> In the code, there are some files (Freemarker templates) that reside in <code>src/main/resources/templates</code>. When run locally, I'm able access the files: <pre class="prettyprint"><code>File[] files = new File("src/main/resources/templates/").listFiles(); </code></pre> When the job is run on a cluster, a null-pointer exception is returned when the previous line is executed. If I run <code>jar tf my-jar.jar</code> I can see that the files are packaged in a <code>templates/</code> folder: <pre class="prettyprint"><code> [...] templates/ templates/my_template.ftl [...] </code></pre> I'm just unable to read them; I suspect that <code>.listFiles()</code> tries to access the local filesystem on the cluster node, and the files aren't there. I'm curious to know how I should package files to be used within a self-contained Spark job. I'd rather not copy them to HDFS outside of the job because it becomes messy to maintain.

I have accessed my resource file like below in spark-scala. I've share my code please check. <pre class="prettyprint"><code>val fs=this.getClass().getClassLoader().getResourceAsStream("smoke_test/loadhadoop.txt") val dataString=scala.io.Source.fromInputStream(fs).mkString </code></pre>

Spark job in Java: how to access files from 'resources' when run on a cluster

Tags:

java

apache-spark

I wrote a Spark job in Java. The job is packaged as a shaded jar and executed:

spark-submit my-jar.jar

In the code, there are some files (Freemarker templates) that reside in src/main/resources/templates. When run locally, I'm able access the files:

File[] files = new File("src/main/resources/templates/").listFiles();

When the job is run on a cluster, a null-pointer exception is returned when the previous line is executed.

If I run jar tf my-jar.jar I can see that the files are packaged in a templates/ folder:

 [...]
 templates/
 templates/my_template.ftl
 [...]

I'm just unable to read them; I suspect that .listFiles() tries to access the local filesystem on the cluster node, and the files aren't there.

I'm curious to know how I should package files to be used within a self-contained Spark job. I'd rather not copy them to HDFS outside of the job because it becomes messy to maintain.

325

asked Apr 17 '16 18:04

Alex Woolford

1 Answers

I have accessed my resource file like below in spark-scala. I've share my code please check.

val fs=this.getClass().getClassLoader().getResourceAsStream("smoke_test/loadhadoop.txt")

val dataString=scala.io.Source.fromInputStream(fs).mkString

answered Nov 15 '22 20:11

Anand

Related questions
                            
                                JavaFx :Default Message for Empty ListView
                            
                                Sort an (Array)List with a specific order
                            
                                How to match tab and newline but not space with REGEX?
                            
                                SEVERE: Unable to create initial connections of pool - tomcat 7 with context.xml file
                            
                                org.hibernate.exception.SQLGrammarException: could not prepare statement
                            
                                Remove from HashMap if a key is not in the list
                            
                                Autowire a string from Spring @Configuration class?
                            
                                Websphere MQ v8 - MQRC_NOT_AUTHORIZED - 2035
                            
                                Java - String splits by every character
                            
                                ActiveMQ setup - Unable to send the message to Queue (error - java.io.IOException: Unknown data type: 47)
                            
                                Java Control Panel and command line show different Java 1.7 versions on Mac OS X 10.9.5. What's up?
                            
                                Android: startRecording() called on an uninitialized AudioRecord when SAMPLERATE set to 44100
                            
                                webupd8 JAVA_HOME not set after installing oracle-java8-set-default
                            
                                Java Selenium Chromedriver.exe Does not Exist IllegalStateException
                            
                                Creating a Christmas Tree using for loops
                            
                                Spring autowire interface
                            
                                Unmarshal JSON to Java POJO in JAX-RS
                            
                                Crawler4j vs. Jsoup for the pages crawling and parsing in Java
                            
                                Can't change EditText margins programmatically inside AlertDialog
                            
                                Getting file size from S3 bucket

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With