Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting file/folder from Hadoop

I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.208.42.127:9000/home/hadoop/temp-output-s3copy already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:879)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1316)
    at com.valtira.datapipeline.stream.CloudFrontStreamLogProcessors.main(CloudFrontStreamLogProcessors.java:216)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

How can I delete that folder from Hadoop?

like image 299
cevallos.valtira Avatar asked May 28 '13 16:05

cevallos.valtira


People also ask

How do I recursively delete a folder in HDFS?

rm: Remove a file from HDFS, similar to Unix rm command. This command does not delete directories. For recursive delete, use command -rm -r .

How do I delete a folder in hive?

You mean do you want to delete the folders on which the hive table is created? If its a managed table then dropping the hive table will delete the folders underneath the warehouse. But if it is a external table then you have to manual delete the folders/ files underneath.


1 Answers

When you say delete from Hadoop, you really mean delete from HDFS.

To delete something from HDFS do one of the two

From the command line:

  • deprecated way:

hadoop dfs -rmr hdfs://path/to/file

  • new way (with hadoop 2.4.1) :

hdfs dfs -rm -r hdfs://path/to/file

Or from java:

FileSystem fs = FileSystem.get(getConf());
fs.delete(new Path("path/to/file"), true); // delete file, true for recursive 
like image 131
greedybuddha Avatar answered Sep 22 '22 11:09

greedybuddha