How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?

Tags:

I have a simple hadoop job that crawls websites and caches them to the HDFS. The mapper checks if a URL already exists in the HDFS and if so, uses it otherwise downloads the page and saves it to the HDFS.

If an network error (404, etc) is encountered while downloading the page, then the URL is skipped entirely - not written to the HDFS. Whenever I run a small list ~1000 websites, I always seem to encounter this error which crashes the job repeatedly in my pseudo distributed installation. What could be the problem?

I'm running Hadoop 0.20.2-cdh3u3.

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/raj/cache/9b4edc6adab6f81d5bbb84fdabb82ac0 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1520)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:665)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

934

asked Apr 03 '12 04:04

Raj

1 Answers

The problem was an unclosed FileSystem InputStream instance in the mapper that was used for caching input to file system.

165

answered Sep 21 '22 15:09

Raj

Related questions
                            
                                How to set up Hadoop in Docker Swarm?
                            
                                pyspark : how to check if a file exists in hdfs
                            
                                "LOST" node in EMR Cluster
                            
                                Making spark use /etc/hosts file for binding in YARN cluster mode
                            
                                Passing additional parameters to dbConnect function for JDBCDriver in R
                            
                                Not able to install hadoop using Cloudera Manager
                            
                                Why would Spark choose to do all work on a single node?
                            
                                HBASE 0.94.1 compatibility with hadoop
                            
                                Hadoop Ports Clarification
                            
                                could to find or load main class org.apache.nutch.crawl.InjectorJob
                            
                                Spark 1.3.0 on YARN: Application failed 2 times due to AM Container
                            
                                Why would someone run Spark / Flink on Tez?
                            
                                How does Pig use Hadoop Globs in a 'load' statement?
                            
                                Establishing a connection between R and a Hive (Hadoop) database
                            
                                MultipleTextOutputFormat alternative in new API
                            
                                Exploding a row of XML data in Hive
                            
                                Kerberos Authentication Error - When loading Hadoop Config Files from SharedPath
                            
                                Is it possible to configure Apache Livy to run with Spark Standalone?
                            
                                Hadoop + Spark: There are 1 datanode(s) running and 1 node(s) are excluded in this operation
                            
                                How can I tell if a hadoop namenode has already been formatted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?

Tags:

hadoop

cloudera

Raj

People also ask

1 Answers

Raj

Recent Activity

Donate For Us