What are SUCCESS and part-r-00000 files in hadoop

Tags:

Although I use Hadoop frequently on my Ubuntu machine I have never thought about SUCCESS and part-r-00000 files. The output always resides in part-r-00000 file, but what is the use of SUCCESS file? Why does the output file have the name part-r-0000? Is there any significance/any nomenclature or is this just a randomly defined?

687

asked May 19 '12 15:05

Ravi Joshi

1 Answers

See http://www.cloudera.com/blog/2010/08/what%E2%80%99s-new-in-apache-hadoop-0-21/

On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS. (MAPREDUCE-947)

This would typically be used by job scheduling systems (such as OOZIE), to denote that follow-on processing on the contents of this directory can commence as all the data has been output.

Update (in response to comment)

The output files are by default named part-x-yyyyy where:

x is either 'm' or 'r', depending on whether the job was a map only job, or reduce
yyyyy is the mapper or reducer task number (zero based)

So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.

167

answered Sep 28 '22 02:09

Chris White

Related questions
                            
                                SparkSQL vs Hive on Spark - Difference and pros and cons?
                            
                                Why spark-shell fails with NullPointerException?
                            
                                Thrift, Avro, Protocolbuffers - Are they all dead?
                            
                                Setting the number of map tasks and reduce tasks
                            
                                How to get started with Big Data Analysis [closed]
                            
                                Free Large datasets to experiment with Hadoop
                            
                                Datanode process not running in Hadoop
                            
                                Datanode not starts correctly
                            
                                Cascading examples failed to compile?
                            
                                Spark on yarn concept understanding
                            
                                Cleanest way in Gradle to get the path to a jar file in the gradle dependency cache
                            
                                What is best way to start and stop hadoop ecosystem, with command line?
                            
                                How to get the input file name in the mapper in a Hadoop program?
                            
                                Why HBase is a better choice than Cassandra with Hadoop?
                            
                                Schema evolution in parquet format
                            
                                How to write 'map only' hadoop jobs?
                            
                                COLLECT_SET() in Hive, keep duplicates?
                            
                                Default Namenode port of HDFS is 50070.But I have come across at some places 8020 or 9000 [closed]
                            
                                java.net.URISyntaxException when starting HIVE
                            
                                What is a container in YARN?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are SUCCESS and part-r-00000 files in hadoop

Tags:

hadoop

mapreduce

Ravi Joshi

People also ask

1 Answers

Chris White

Recent Activity

Donate For Us