Cannot use apache flink in amazon emr

Tags:

I can not a start a yarn session of Apache Flink in Amazons EMR. The error message I get is

$ tar xvfj flink-0.9.0-bin-hadoop26.tgz
$ cd flink-0.9.0
$ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
...
Diagnostics: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
java.io.FileNotFoundException: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
...

I am using Flink verision 0.9 and Amazons Hadoop version 4.0.0. Any ideas or hints?

The full log can be found here: https://gist.github.com/headmyshoulder/48279f06c1850c62c28c

348

asked Aug 13 '15 15:08

headmyshoulder

1 Answers

From the log:

The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the sytem is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system

Flink failed to read the Hadoop configuration files. They are either picked up from the environment variables, e.g. HADOOP_HOME, or you can set the configuration dir in the flink-conf.yaml before you execute your YARN command.

Flink needs to read the Hadoop configuration to know how to upload the Flink jar to the cluster file system such that the newly created YARN cluster can access it. If Flink fails to resolve the Hadoop configuration, it uses the local file system for uploading the jar. That means that the jar will be put on the machine you launch your cluster from. Thus, it won't be accessible from the Flink YARN cluster.

Please see the Flink configuration page for more information.

edit: On Amazong EMR, export HADOOP_CONF_DIR=/etc/hadoop/conf let's Flink discover the Hadoop configuration directory.

answered Sep 19 '22 21:09

mxm

Related questions
                            
                                How to run 2 EMR Spark Step Concurrently?
                            
                                Sqoop - Binding to YARN queues
                            
                                org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout
                            
                                Slave nodes not in Yarn ResourceManager
                            
                                Spark Failure : Caused by: org.apache.spark.shuffle.FetchFailedException: Too large frame: 5454002341
                            
                                auxService:mapreduce_shuffle does not exist on hive
                            
                                Hadoop: specify yarn queue for distcp
                            
                                How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce
                            
                                Making spark use /etc/hosts file for binding in YARN cluster mode
                            
                                Why would Spark choose to do all work on a single node?
                            
                                Hadoop Ports Clarification
                            
                                Spark 1.3.0 on YARN: Application failed 2 times due to AM Container
                            
                                Yarn slave nodes are not communicating with master node?
                            
                                YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register
                            
                                spark on yarn, Connecting to ResourceManager at /0.0.0.0:8032
                            
                                What is the difference between Driver and Application manager in spark
                            
                                Can't run a MapReduce job on hadoop 2.4.0
                            
                                Hive Runtime Error while processing row in Hive
                            
                                Is multithreading allowed on Spark/YARN?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cannot use apache flink in amazon emr

Tags:

hadoop-yarn

apache-flink

emr

amazon-emr

headmyshoulder

People also ask

1 Answers

mxm

Recent Activity

Donate For Us