I am new to Spark and trying to figure out how can I use the Spark shell.
Looked into Spark's site documentation and it doesn't show how to create directories or how to see all my files in spark shell. If anyone could help me I would appreciate it.
Spark Shell Commands are the command-line interfaces that are used to operate spark processing. Spark Shell commands are useful for processing ETL and Analytics through Machine Learning implementation on high volume datasets with very less time.
Go to the Apache Spark Installation directory from the command line and type bin/spark-shell and press enter, this launches Spark shell and gives you a scala prompt to interact with Spark in scala language. If you have set the Spark in a PATH then just enter spark-shell in command line or terminal (mac users).
Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).
Making this more systematic: Put the code below in a script (e.g. spark-script.sh ), and then you can simply use: ./spark-script.sh your_file. scala first_arg second_arg third_arg , and have an Array[String] called args with your arguments.
In this context you can assume that Spark shell is just a normal Scala REPL so the same rules apply. You can get a list of the available commands using :help
.
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151) Type in expressions to have them evaluated. Type :help for more information. scala> :help All commands can be abbreviated, e.g., :he instead of :help. :edit <id>|<line> edit history :help [command] print this summary or command-specific help :history [num] show the history (optional num is commands to show) :h? <string> search the history :imports [name name ...] show import history, identifying sources of names :implicits [-v] show the implicits in scope :javap <path|class> disassemble a file or class name :line <id>|<line> place line(s) at the end of history :load <path> interpret lines in a file :paste [-raw] [path] enter paste mode or paste a file :power enable power user mode :quit exit the interpreter :replay [options] reset the repl and replay all previous commands :require <path> add a jar to the classpath :reset [options] reset the repl to its initial state, forgetting all session entries :save <path> save replayable session to a file :sh <command line> run a shell command (result is implicitly => List[String]) :settings <options> update compiler options, if possible; see reset :silent disable/enable automatic printing of results :type [-v] <expr> display the type of an expression without evaluating it :kind [-v] <expr> display the kind of expression's type :warnings show the suppressed warnings from the most recent line which had any
As you can see above you can invoke shell commands using :sh
. For example:
scala> :sh mkdir foobar res0: scala.tools.nsc.interpreter.ProcessResult = `mkdir foobar` (0 lines, exit 0) scala> :sh touch foobar/foo res1: scala.tools.nsc.interpreter.ProcessResult = `touch foobar/foo` (0 lines, exit 0) scala> :sh touch foobar/bar res2: scala.tools.nsc.interpreter.ProcessResult = `touch foobar/bar` (0 lines, exit 0) scala> :sh ls foobar res3: scala.tools.nsc.interpreter.ProcessResult = `ls foobar` (2 lines, exit 0) scala> res3.line foreach println line lines scala> res3.lines foreach println bar foo
:q
or :quit
command is used to exit from your scala REPL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With