How to overwrite/reuse the existing output path for Hadoop jobs again and agian

Tags:

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day's job run results. If I specify the same output directory it gives the error "output directory already exists".

How to bypass this validation?

779

asked Oct 10 '11 13:10

yogesh

2 Answers

What about deleting the directory before you run the job?

You can do this via shell:

hadoop fs -rmr /path/to/your/output/

or via the Java API:

// configuration should contain reference to your namenode FileSystem fs = FileSystem.get(new Configuration()); // true stands for recursively deleting the folder you gave fs.delete(new Path("/path/to/your/output"), true);

103

answered Nov 06 '22 20:11

Thomas Jungblut

Jungblut's answer is your direct solution. Since I never trust automated processes to delete stuff (me personally), I'll suggest an alternative:

Instead of trying to overwrite, I suggest you make the output name of your job dynamic, including the time in which it ran.

Something like "/path/to/your/output-2011-10-09-23-04/". This way you can keep around your old job output in case you ever need to revisit in. In my system, which runs 10+ daily jobs, we structure the output to be: /output/job1/2011/10/09/job1out/part-r-xxxxx, /output/job1/2011/10/10/job1out/part-r-xxxxx, etc.

answered Nov 06 '22 22:11

Donald Miner

Related questions
                            
                                How to iterate through dict in random order in Python?
                            
                                How to compile glibc 32bit on an x86_64 machine
                            
                                In ASP.NET MVC 3, what is filterContext.IsChildAction?
                            
                                Attributes and Named/Optional constructor parameters not working
                            
                                How to conditionally handle division by zero with MySQL
                            
                                Is there any difference between "margin: 0 auto;" and "margin: auto;"
                            
                                query optimizer operator choice - nested loops vs hash match (or merge)
                            
                                SWT and AWT, what is the difference?
                            
                                How do I use the filter function in Haskell?
                            
                                Is it possible to build a Chrome extension using Java?
                            
                                java.lang.NoSuchMethodError: org.hibernate.SessionFactory.openSession()Lorg/hibernate/classic/Session
                            
                                Is there an API for Google's speech recognition technology? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With