An overview of my environments: Mac OS Yosemite, Play framework 2.3.7, sbt 0.13.7, Intellij Idea 14, java 1.8.0_25
I tried to run a simple Spark program in Play framework, so I just create a Play 2 project in Intellij, and change some files as follows:
app/Controllers/Application.scala:
package controllers
import play.api._
import play.api.libs.iteratee.Enumerator
import play.api.mvc._
object Application extends Controller {
def index = Action {
Ok(views.html.index("Your new application is ready."))
}
def trySpark = Action {
Ok.chunked(Enumerator(utils.TrySpark.runSpark))
}
}
app/utils/TrySpark.scala:
package utils
import org.apache.spark.{SparkContext, SparkConf}
object TrySpark {
def runSpark: String = {
val conf = new SparkConf().setAppName("trySpark").setMaster("local[4]")
val sc = new SparkContext(conf)
val data = sc.textFile("public/data/array.txt")
val array = data.map ( line => line.split(' ').map(_.toDouble) )
val sum = array.first().reduce( (a, b) => a + b )
return sum.toString
}
}
public/data/array.txt:
1 2 3 4 5 6 7
conf/routes:
GET / controllers.Application.index
GET /spark controllers.Application.trySpark
GET /assets/*file controllers.Assets.at(path="/public", file)
build.sbt:
name := "trySpark"
version := "1.0"
lazy val `tryspark` = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.10.4"
libraryDependencies ++= Seq( jdbc , anorm , cache , ws,
"org.apache.spark" % "spark-core_2.10" % "1.2.0")
unmanagedResourceDirectories in Test <+= baseDirectory ( _ /"target/web/public/test" )
I type activator run
to run this app in development mode then type localhost:9000/spark
in the browser, it shows result 28
as expected. However, when I want type activator start
to run this app in production mode it shows the following error message:
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
[error] application -
! @6kik15fee - Internal server error, for (GET) [/spark] ->
play.api.Application$$anon$1: Execution exception[[InvalidInputException: Input path does not exist: file:/Path/to/my/project/target/universal/stage/public/data/array.txt]]
at play.api.Application$class.handleError(Application.scala:296) ~[com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.api.DefaultApplication.handleError(Application.scala:402) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:205) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:202) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) [org.scala-lang.scala-library-2.10.4.jar:na]
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Path/to/my/project/target/universal/stage/public/data/array.txt
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
It seems that my array.txt
file is not loaded in the production mode. How can solve this problem?
The problem here is that the public
directory will not be available in your root project dir when you run in production. It is packaged as a jar (usually in STAGE_DIR/lib/PROJ_NAME-VERSION-assets.jar
) so you will not be able to access them this way.
I can see two solutions here:
1) Place the file in the conf
directory. This will work, but seems very dirty especially if you intend to use more data files;
2) Place those files in some directory and tell sbt to package it as well. You can keep using the public
directory although it seems better to use a different dir especially if you would want to have many more files.
Supposing array.txt
is placed in a dir named datafiles
in your project root, you can add this to build.sbt
:
mappings in Universal ++=
(baseDirectory.value / "datafiles" * "*" get) map
(x => x -> ("datafiles/" + x.getName))
Don't forget to change the paths in your app code:
// (...)
val data = sc.textFile("datafiles/array.txt")
Then just do a clean and when you run either start
, stage
or dist
those files will be available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With