I am an amateur with hadoop and stuffs. Now, I am trying to access the hadoop cluster (HDFS) and retrieve the list of files from client eclipse. I can do the following operations after setting up the required configurations on hadoop java client.
I can perform copyFromLocalFile, copyToLocalFile operations accessing HDFS from client. Here's what I am facing. When i give listFiles() method I am getting
org.apache.hadoop.fs.LocatedFileStatus@d0085360
org.apache.hadoop.fs.LocatedFileStatus@b7aa29bf
MainMethod
Properties props = new Properties();
props.setProperty("fs.defaultFS", "hdfs://<IPOFCLUSTER>:8020");
props.setProperty("mapreduce.jobtracker.address", "<IPOFCLUSTER>:8032");
props.setProperty("yarn.resourcemanager.address", "<IPOFCLUSTER>:8032");
props.setProperty("mapreduce.framework.name", "yarn");
FileSystem fs = FileSystem.get(toConfiguration(props)); // Setting up the required configurations
Path p4 = new Path("/user/myusername/inputjson1/");
RemoteIterator<LocatedFileStatus> ritr = fs.listFiles(p4, true);
while(ritr.hasNext())
{
System.out.println(ritr.next().toString());
}
I have also tried FileContext and ended up only getting the filestatus object string or something. Is there a possibility to take the filenames when i iterate to the remote hdfs directory, there is a method called getPath(), Is that the only way we can retrieve the full path of the filenames using the hadoop API or there are any other method so that i can retrieve only name of the files in a specified directory path, Please help me through this, Thanks.
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location.
You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.
This command is used to empty the trash available in an HDFS system.
The following arguments are available with hadoop ls command: Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u] <args> Options: -d: Directories are listed as plain files. -h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864). -R: Recursively list subdirectories encountered.
You can indeed use getPath()
this will return you a Path
object which let you query the name of the file.
Path p = ritr.next().getPath();
// returns the filename or directory name if directory
String name = p.getName();
The FileStatus
object you get can tell you if this is a file or directory.
Here is more API documentation:
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/Path.html
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/FileStatus.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With