Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ListFiles from HDFS Cluster

Tags:

java

hadoop

I am an amateur with hadoop and stuffs. Now, I am trying to access the hadoop cluster (HDFS) and retrieve the list of files from client eclipse. I can do the following operations after setting up the required configurations on hadoop java client.

I can perform copyFromLocalFile, copyToLocalFile operations accessing HDFS from client. Here's what I am facing. When i give listFiles() method I am getting

org.apache.hadoop.fs.LocatedFileStatus@d0085360
org.apache.hadoop.fs.LocatedFileStatus@b7aa29bf

MainMethod

Properties props = new Properties();
props.setProperty("fs.defaultFS", "hdfs://<IPOFCLUSTER>:8020");
props.setProperty("mapreduce.jobtracker.address", "<IPOFCLUSTER>:8032");
props.setProperty("yarn.resourcemanager.address", "<IPOFCLUSTER>:8032");
props.setProperty("mapreduce.framework.name", "yarn");
FileSystem fs = FileSystem.get(toConfiguration(props)); // Setting up the required configurations
Path p4 = new Path("/user/myusername/inputjson1/");
RemoteIterator<LocatedFileStatus> ritr = fs.listFiles(p4, true);
while(ritr.hasNext())
        {
            System.out.println(ritr.next().toString());
        }

I have also tried FileContext and ended up only getting the filestatus object string or something. Is there a possibility to take the filenames when i iterate to the remote hdfs directory, there is a method called getPath(), Is that the only way we can retrieve the full path of the filenames using the hadoop API or there are any other method so that i can retrieve only name of the files in a specified directory path, Please help me through this, Thanks.

like image 609
Logan Avatar asked Jul 09 '12 11:07

Logan


People also ask

How do I get a list of files in HDFS?

Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location.

Is the best way to copy files between HDFS cluster?

You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.

What is expunge in HDFS?

This command is used to empty the trash available in an HDFS system.

How do I list folders in HDFS?

The following arguments are available with hadoop ls command: Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u] <args> Options: -d: Directories are listed as plain files. -h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864). -R: Recursively list subdirectories encountered.


1 Answers

You can indeed use getPath() this will return you a Path object which let you query the name of the file.

Path p = ritr.next().getPath();
// returns the filename or directory name if directory
String name = p.getName();    

The FileStatus object you get can tell you if this is a file or directory.

Here is more API documentation:

http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/Path.html

http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/FileStatus.html

like image 112
Thomas Jungblut Avatar answered Sep 26 '22 06:09

Thomas Jungblut