Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete hdfs folder from java

Tags:

java

hadoop

hdfs

In a java app running on an edge node, I need to delete a hdfs folder, if it exists. I need to do that before running a mapreduce job (with spark) that output in the folder.

I found I could use the method

org.apache.hadoop.fs.FileUtil.fullyDelete(new File(url))

However, I can only make it work with local folder (i.e. file url on the running computer). I tried to use something like:

url = "hdfs://hdfshost:port/the/folder/to/delete";

with hdfs://hdfshost:port being the hdfs namenode IPC. I use it for the mapreduce, so it is correct. However it doesn't do anything.

So, what url should I use, or is there another method?

Note: here is the simple project in question.

like image 417
Juh_ Avatar asked Feb 27 '15 14:02

Juh_


People also ask

How do I delete a HDFS folder?

Use an HDFS file manager to delete directories. See your Hadoop distribution's documentation to determine if it provides a file manager. Log into the Hadoop NameNode using the database administrator's account and use HDFS's rmr command to delete the directories.

What is the command to remove a file under HDFS?

rm: Remove a file from HDFS, similar to Unix rm command. This command does not delete directories. For recursive delete, use command -rm -r .

Which of the following command will create an empty file in HDFS's Mydir folder?

bin/hdfs dfs -mkdir /geeks => '/' means absolute path bin/hdfs dfs -mkdir geeks2 => Relative path -> the folder will be created relative to the home directory. touchz: It creates an empty file.


2 Answers

This works for me.

Just add the following codes in my WordCount program will do:

import org.apache.hadoop.fs.*;

...
Configuration conf = new Configuration();

Path output = new Path("/the/folder/to/delete");
FileSystem hdfs = FileSystem.get(URI.create("hdfs://namenode:port"),conf);

// delete existing directory
if (hdfs.exists(output)) {
    hdfs.delete(output, true);
}

Job job = Job.getInstance(conf, "word count");
...

You need to add hdfs://hdfshost:port explicitly to get distributed file system. Else the code will work for local file system only.

like image 120
Jun Avatar answered Sep 27 '22 22:09

Jun


I do it this way:

    Configuration conf = new Configuration();
    conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
    conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
    FileSystem  hdfs = FileSystem.get(URI.create("hdfs://<namenode-hostname>:<port>"), conf);
    hdfs.delete("/path/to/your/file", isRecursive);

you don't need hdfs://hdfshost:port/ in your file path

like image 30
Tucker Avatar answered Sep 27 '22 22:09

Tucker