I am new with Spark and I wanted to ask some common guidelines about developing and testing my code for Apache Spark framework <ol> <li>What is the most common setup to test my code locally? Is there any built VM to raise (ready box etc.)? Do I have to setup locally spark? Is there any test library to test my code?</li> <li> When going in cluster mode I notice that there are some ways to setup your cluster; production wise, what is the most common way to setup a cluster to run Spark? Three options here <ul> <li>Standalone cluster setup </li> <li>With YARN </li> <li>With MESOS </li> </ul> </li> </ol> Thank you

1) Common setup: Just download the Spark version on a local machine. Unzip it and follow these steps to set it up locally. 2) Launching a cluster for production: The Spark cluster mode overview available here explains the key concepts when running a Spark cluster. Spark can be run both in a standalone way and on several existing cluster managers. Currently, several deployments options are available: <ul> <li>Amazon EC2</li> <li>Standalone mode </li> <li>Apache Mesos</li> <li>Hadoop YARN </li> </ul> EC2 scripts let you launch a cluster in about 5 minutes. In fact, if you are using EC2, the best way to go is using the script provided by spark. The standalone mode is the best for the deployment of Spark on a private cluster. Normally, we use YARN as cluster manager when we have an existing Hadoop setup with YARN, and the same goes for Mesos. Instead, if you are creating a new cluster from the dust, I would recommend using the Standalone mode, considering you are not using Amazon's EC2 instances. This link shows some steps that help arranging a Standalone Spark cluster.

Sandbox from Hortonworks hope will help. HDP 2.2.4 Sandbox with Apache Spark & Ambari Views http://hortonworks.com/products/hortonworks-sandbox/#install Second resource I'm using is http://www.cloudera.com/downloads/quickstart_vms/5-8.html The image does contain Hadoop, HBase, Impala, Spark and many more features. Does require 4gb RAM, 1 CPU and 62.5GB disk. Kind large but is free and does fulfill all requirements rather paid versions on cloud.

Development with Apache Spark

2 Answers

1) Common setup: Just download the Spark version on a local machine. Unzip it and follow these steps to set it up locally.

2) Launching a cluster for production: The Spark cluster mode overview available here explains the key concepts when running a Spark cluster. Spark can be run both in a standalone way and on several existing cluster managers. Currently, several deployments options are available:

Amazon EC2
Standalone mode
Apache Mesos
Hadoop YARN

EC2 scripts let you launch a cluster in about 5 minutes. In fact, if you are using EC2, the best way to go is using the script provided by spark. The standalone mode is the best for the deployment of Spark on a private cluster.

Normally, we use YARN as cluster manager when we have an existing Hadoop setup with YARN, and the same goes for Mesos. Instead, if you are creating a new cluster from the dust, I would recommend using the Standalone mode, considering you are not using Amazon's EC2 instances. This link shows some steps that help arranging a Standalone Spark cluster.

189

answered Sep 22 '22 06:09

Sandesh Deshmane

Sandbox from Hortonworks hope will help.

HDP 2.2.4 Sandbox with Apache Spark & Ambari Views http://hortonworks.com/products/hortonworks-sandbox/#install

Second resource I'm using is http://www.cloudera.com/downloads/quickstart_vms/5-8.html

The image does contain Hadoop, HBase, Impala, Spark and many more features. Does require 4gb RAM, 1 CPU and 62.5GB disk. Kind large but is free and does fulfill all requirements rather paid versions on cloud.

answered Sep 21 '22 06:09

n1tk

Related questions
                            
                                Creating an executable .jar using Intellij
                            
                                How do I match a Class<?> against a specific Class instance in a Hamcrest Matcher?
                            
                                Why is main method used in JavaFX Application when start() already exist
                            
                                What is difference between Generic type and Object in method declaration?
                            
                                Java nashorn compare if a java objects is of a certain java type
                            
                                Parsing Data for android-L failed
                            
                                Java Config @Bean not autowired in other @Configuration class
                            
                                Regex - Match a string which has zero or one spaces
                            
                                Jersey client to download and save file
                            
                                Which collection class in java breaks the S.O.L.I.D Principle? [closed]
                            
                                Spring: Programmatically bind object from request
                            
                                Eclipse Luna (4.4.0) and Subclipse not working
                            
                                javax.ws.rs.NotFoundException: Could not find resource for full path
                            
                                onCreateView Fragment not called
                            
                                Spring JDBC + Postgres SQL + Java 8 - conversion from/to LocalDate
                            
                                Maven jetty plugin - automatic reload using a multi-module project
                            
                                How far does the new metaspace go and when will it stop
                            
                                What is the scope of @EnableTransactionManagement?
                            
                                Restful Path Usage with "/"
                            
                                Package does not exist when using separate App as a dependency

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Development with Apache Spark

Tags:

java

apache-spark

tbo

People also ask

2 Answers

Sandesh Deshmane

n1tk

Recent Activity

Donate For Us