Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need to add "fork in run := true" when running Spark SBT application?

I have built a simple Spark app using sbt. Here's my code:

import org.apache.spark.sql.SparkSession

object HelloWorld {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local").appName("BigApple").getOrCreate()

    import spark.implicits._

    val ds = Seq(1, 2, 3).toDS()
    ds.map(_ + 1).foreach(x => println(x))
  }
}

Following is my build.sbt

name := """sbt-sample-app"""

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.6" % "test"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1"

Now when I try to do sbt run, it gives me following error:

$ sbt run
[info] Loading global plugins from /home/user/.sbt/0.13/plugins
[info] Loading project definition from /home/user/Projects/sample-app/project
[info] Set current project to sbt-sample-app (in build file:/home/user/Projects/sample-app/)
[info] Running HelloWorld 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/01 10:09:10 INFO SparkContext: Running Spark version 2.1.1
17/06/01 10:09:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/01 10:09:11 WARN Utils: Your hostname, user-Vostro-15-3568 resolves to a loopback address: 127.0.1.1; using 127.0.0.1 instead (on interface enp3s0)
17/06/01 10:09:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/06/01 10:09:11 INFO SecurityManager: Changing view acls to: user
17/06/01 10:09:11 INFO SecurityManager: Changing modify acls to: user
17/06/01 10:09:11 INFO SecurityManager: Changing view acls groups to: 
17/06/01 10:09:11 INFO SecurityManager: Changing modify acls groups to: 
17/06/01 10:09:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(user); groups with view permissions: Set(); users  with modify permissions: Set(user); groups with modify permissions: Set()
17/06/01 10:09:12 INFO Utils: Successfully started service 'sparkDriver' on port 39662.
17/06/01 10:09:12 INFO SparkEnv: Registering MapOutputTracker
17/06/01 10:09:12 INFO SparkEnv: Registering BlockManagerMaster
17/06/01 10:09:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/06/01 10:09:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/06/01 10:09:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-c6db1535-6a00-4760-93dc-968722e3d596
17/06/01 10:09:12 INFO MemoryStore: MemoryStore started with capacity 408.9 MB
17/06/01 10:09:13 INFO SparkEnv: Registering OutputCommitCoordinator
17/06/01 10:09:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/06/01 10:09:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://127.0.0.1:4040
17/06/01 10:09:13 INFO Executor: Starting executor ID driver on host localhost
17/06/01 10:09:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34488.
17/06/01 10:09:13 INFO NettyBlockTransferService: Server created on 127.0.0.1:34488
17/06/01 10:09:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/06/01 10:09:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 127.0.0.1, 34488, None)
17/06/01 10:09:13 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:34488 with 408.9 MB RAM, BlockManagerId(driver, 127.0.0.1, 34488, None)
17/06/01 10:09:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 127.0.0.1, 34488, None)
17/06/01 10:09:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 127.0.0.1, 34488, None)
17/06/01 10:09:14 INFO SharedState: Warehouse path is 'file:/home/user/Projects/sample-app/spark-warehouse'.
[error] (run-main-0) scala.ScalaReflectionException: class scala.Option in JavaMirror with ClasspathFilter(
[error]   parent = URLClassLoader with NativeCopyLoader with RawResources(
[error]   urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ...,/home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar),
[error]   parent = java.net.URLClassLoader@7c4113ce,
[error]   resourceMap = Set(app.class.path, boot.class.path),
[error]   nativeTemp = /tmp/sbt_c2afce
[error] )
[error]   root = sun.misc.Launcher$AppClassLoader@677327b6
[error]   cp = Set(/home/user/.ivy2/cache/org.glassfish.jersey.core/jersey-common/jars/jersey-common-2.22.2.jar, ..., /home/user/.ivy2/cache/net.razorvine/pyrolite/jars/pyrolite-4.13.jar)
[error] ) of type class sbt.classpath.ClasspathFilter with classpath [<unknown>] and parent being URLClassLoader with NativeCopyLoader with RawResources(
[error]   urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ..., /home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar),
[error]   parent = java.net.URLClassLoader@7c4113ce,
[error]   resourceMap = Set(app.class.path, boot.class.path),
[error]   nativeTemp = /tmp/sbt_c2afce
[error] ) of type class sbt.classpath.ClasspathUtilities$$anon$1 with classpath [file:/home/user/Projects/sample-app/target/scala-2.11/classes/,...openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/classes] not found.
scala.ScalaReflectionException: class scala.Option in JavaMirror with ClasspathFilter(
  parent = URLClassLoader with NativeCopyLoader with RawResources(
  urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ..., /home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar),
  parent = java.net.URLClassLoader@7c4113ce,
  resourceMap = Set(app.class.path, boot.class.path),
  nativeTemp = /tmp/sbt_c2afce
)
  root = sun.misc.Launcher$AppClassLoader@677327b6
  cp = Set(/home/user/.ivy2/cache/org.glassfish.jersey.core/jersey-common/jars/jersey-common-2.22.2.jar, ..., /home/user/.ivy2/cache/net.razorvine/pyrolite/jars/pyrolite-4.13.jar)
) of type class sbt.classpath.ClasspathFilter with classpath [<unknown>] and parent being URLClassLoader with NativeCopyLoader with RawResources(
  urls = List(/home/user/Projects/sample-app/target/scala-2.11/classes, ..., /home/user/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.8.1.jar),
  parent = java.net.URLClassLoader@7c4113ce,
  resourceMap = Set(app.class.path, boot.class.path),
  nativeTemp = /tmp/sbt_c2afce
) of type class sbt.classpath.ClasspathUtilities$$anon$1 with classpath [file:/home/user/Projects/sample-app/target/scala-2.11/classes/,.../jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/classes] not found.
    at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:123)
    at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:22)
    at org.apache.spark.sql.catalyst.ScalaReflection$$typecreator42$1.apply(ScalaReflection.scala:614)
    at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
    at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:782)
    at org.apache.spark.sql.catalyst.ScalaReflection$.localTypeOf(ScalaReflection.scala:39)
    at org.apache.spark.sql.catalyst.ScalaReflection$.optionOfProductType(ScalaReflection.scala:614)
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:51)
    at org.apache.spark.sql.Encoders$.scalaInt(Encoders.scala:281)
    at org.apache.spark.sql.SQLImplicits.newIntEncoder(SQLImplicits.scala:54)
    at HelloWorld$.main(HelloWorld.scala:9)
    at HelloWorld.main(HelloWorld.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
[trace] Stack trace suppressed: run last compile:run for the full output.
17/06/01 10:09:15 ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:181)
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
    at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
    at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
17/06/01 10:09:15 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:80)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
17/06/01 10:09:15 ERROR Utils: throw uncaught fatal error in thread SparkListenerBus
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:80)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
17/06/01 10:09:15 INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 7 s, completed 1 Jun, 2017 10:09:15 AM

But when I add fork in run := true in build.sbt the app runs fine

New build.sbt:

name := """sbt-sample-app"""

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.6" % "test"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1"

fork in run := true

Here's the output:

$ sbt run
[info] Loading global plugins from /home/user/.sbt/0.13/plugins
[info] Loading project definition from /home/user/Projects/sample-app/project
[info] Set current project to sbt-sample-app (in build file:/home/user/Projects/sample-app/)
[success] Total time: 0 s, completed 1 Jun, 2017 10:15:43 AM
[info] Updating {file:/home/user/Projects/sample-app/}sample-app...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[warn] Scala version was updated by one of library dependencies:
[warn]  * org.scala-lang:scala-library:(2.11.7, 2.11.0) -> 2.11.8
[warn] To force scalaVersion, add the following:
[warn]  ivyScala := ivyScala.value map { _.copy(overrideScalaVersion = true) }
[warn] Run 'evicted' to see detailed eviction warnings
[info] Compiling 1 Scala source to /home/user/Projects/sample-app/target/scala-2.11/classes...
[info] Running HelloWorld 
[error] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[error] 17/06/01 10:16:13 INFO SparkContext: Running Spark version 2.1.1
[error] 17/06/01 10:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[error] 17/06/01 10:16:14 WARN Utils: Your hostname, user-Vostro-15-3568 resolves to a loopback address: 127.0.1.1; using 127.0.0.1 instead (on interface enp3s0)
[error] 17/06/01 10:16:14 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing view acls to: user
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing modify acls to: user
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing view acls groups to: 
[error] 17/06/01 10:16:14 INFO SecurityManager: Changing modify acls groups to: 
[error] 17/06/01 10:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(user); groups with view permissions: Set(); users  with modify permissions: Set(user); groups with modify permissions: Set()
[error] 17/06/01 10:16:14 INFO Utils: Successfully started service 'sparkDriver' on port 37747.
[error] 17/06/01 10:16:14 INFO SparkEnv: Registering MapOutputTracker
[error] 17/06/01 10:16:14 INFO SparkEnv: Registering BlockManagerMaster
[error] 17/06/01 10:16:14 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
[error] 17/06/01 10:16:14 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
[error] 17/06/01 10:16:14 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-edf40c39-a13e-4930-8e9a-64135bfa9770
[error] 17/06/01 10:16:14 INFO MemoryStore: MemoryStore started with capacity 1405.2 MB
[error] 17/06/01 10:16:14 INFO SparkEnv: Registering OutputCommitCoordinator
[error] 17/06/01 10:16:14 INFO Utils: Successfully started service 'SparkUI' on port 4040.
[error] 17/06/01 10:16:15 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://127.0.0.1:4040
[error] 17/06/01 10:16:15 INFO Executor: Starting executor ID driver on host localhost
[error] 17/06/01 10:16:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39113.
[error] 17/06/01 10:16:15 INFO NettyBlockTransferService: Server created on 127.0.0.1:39113
[error] 17/06/01 10:16:15 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
[error] 17/06/01 10:16:15 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 127.0.0.1, 39113, None)
[error] 17/06/01 10:16:15 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:39113 with 1405.2 MB RAM, BlockManagerId(driver, 127.0.0.1, 39113, None)
[error] 17/06/01 10:16:15 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 127.0.0.1, 39113, None)
[error] 17/06/01 10:16:15 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 127.0.0.1, 39113, None)
[error] 17/06/01 10:16:15 INFO SharedState: Warehouse path is 'file:/home/user/Projects/sample-app/spark-warehouse/'.
[error] 17/06/01 10:16:18 INFO CodeGenerator: Code generated in 395.134683 ms
[error] 17/06/01 10:16:19 INFO CodeGenerator: Code generated in 9.077969 ms
[error] 17/06/01 10:16:19 INFO CodeGenerator: Code generated in 23.652705 ms
[error] 17/06/01 10:16:19 INFO SparkContext: Starting job: foreach at HelloWorld.scala:10
[error] 17/06/01 10:16:19 INFO DAGScheduler: Got job 0 (foreach at HelloWorld.scala:10) with 1 output partitions
[error] 17/06/01 10:16:19 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at HelloWorld.scala:10)
[error] 17/06/01 10:16:19 INFO DAGScheduler: Parents of final stage: List()
[error] 17/06/01 10:16:19 INFO DAGScheduler: Missing parents: List()
[error] 17/06/01 10:16:19 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at foreach at HelloWorld.scala:10), which has no missing parents
[error] 17/06/01 10:16:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.3 KB, free 1405.2 MB)
[error] 17/06/01 10:16:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 1405.2 MB)
[error] 17/06/01 10:16:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 127.0.0.1:39113 (size: 3.3 KB, free: 1405.2 MB)
[error] 17/06/01 10:16:20 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
[error] 17/06/01 10:16:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at foreach at HelloWorld.scala:10)
[error] 17/06/01 10:16:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
[error] 17/06/01 10:16:20 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 6227 bytes)
[error] 17/06/01 10:16:20 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
[info] 2
[info] 3
[info] 4
[error] 17/06/01 10:16:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1231 bytes result sent to driver
[error] 17/06/01 10:16:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 152 ms on localhost (executor driver) (1/1)
[error] 17/06/01 10:16:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
[error] 17/06/01 10:16:20 INFO DAGScheduler: ResultStage 0 (foreach at HelloWorld.scala:10) finished in 0.181 s
[error] 17/06/01 10:16:20 INFO DAGScheduler: Job 0 finished: foreach at HelloWorld.scala:10, took 0.596960 s
[error] 17/06/01 10:16:20 INFO SparkContext: Invoking stop() from shutdown hook
[error] 17/06/01 10:16:20 INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040
[error] 17/06/01 10:16:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
[error] 17/06/01 10:16:20 INFO MemoryStore: MemoryStore cleared
[error] 17/06/01 10:16:20 INFO BlockManager: BlockManager stopped
[error] 17/06/01 10:16:20 INFO BlockManagerMaster: BlockManagerMaster stopped
[error] 17/06/01 10:16:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[error] 17/06/01 10:16:20 INFO SparkContext: Successfully stopped SparkContext
[error] 17/06/01 10:16:20 INFO ShutdownHookManager: Shutdown hook called
[error] 17/06/01 10:16:20 INFO ShutdownHookManager: Deleting directory /tmp/spark-77d00e78-9f76-4ab2-bc40-0b99940661ac
[success] Total time: 37 s, completed 1 Jun, 2017 10:16:20 AM

Can anyone help me out in understanding the reason behind it ?

like image 284
himanshuIIITian Avatar asked Jun 01 '17 04:06

himanshuIIITian


3 Answers

Excerpt from "Getting Started with SBT for Scala" By Shiti Saxena

Why do we need to fork JVM?

When a user runs code using run or console commands, the code is run on the same virtual machine as SBT. In some cases, running of code may cause SBT to crash, such as a System.exit call or unterminated threads (for example, when running tests on code while simultaneously working on the code).

If a test causes the JVM to shut down, you would need to restart SBT. In order to avoid such scenarious, forking the JVM is important.

You do not need to fork the JVM to run your code if the code follows the constraints listed as follows, else it must be run in a forked JVM:

  • No threads are created or the program ends when user-created threads terminate on their own
  • System.exit is used to end the program and user-created threads terminate when interrupted
  • No deserialization is done or deserialization code ensures that the right class loader is used
like image 139
ZakukaZ Avatar answered Oct 17 '22 02:10

ZakukaZ


From the doc given here

By default, the run task runs in the same JVM as sbt. Forking is required under certain circumstances, however. Or, you might want to fork Java processes when implementing new tasks.

By default, a forked process uses the same Java and Scala versions being used for the build and the working directory and JVM options of the current process. This page discusses how to enable and configure forking for both run and test tasks. Each kind of task may be configured separately by scoping the relevant keys as explained below.

to enable fork in run simply use

fork in run := true
like image 2
koiralo Avatar answered Oct 17 '22 03:10

koiralo


I couldn't find why exactly :

But this is their build file and recommendation :

https://github.com/deanwampler/spark-scala-tutorial/blob/master/project/Build.scala

Hope someone can give a better answer.

Edited Code :

import org.apache.spark.sql.SparkSession

object HelloWorld {
def main(args: Array[String]): Unit = {
   val spark = SparkSession.builder().master("local").appName("BigApple").getOrCreate()

import spark.implicits._

val ds = Seq(1, 2, 3).toDS()
ds.map(_ + 1).foreach(x => println(x))
}
}

build.sbt

name := """untitled"""

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.6" % "test"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1"
like image 1
Sam Upra Avatar answered Oct 17 '22 03:10

Sam Upra