Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the last updated file in HDFS

I want the latest updated file from one of my HDFS directories. The code should basically loop through the directories and sub directories and the get the latest file path with the file name.I was able to get the latest file in local file system but not sure how to do it for HDFS one.

find /tmp/sdsa -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

The above code is working for local file system. I am able to get the date , time and file name from HDFS, but how do I get the latest file using these 3 parameters?

this is the code I tried:

hadoop fs -ls -R /tmp/apps | awk -F" " '{print $6" "$7" "$8}'

Any help will be appreciated.

Thanks in advance.

like image 206
Neethu Lalitha Avatar asked Jan 09 '16 01:01

Neethu Lalitha


People also ask

What is expunge in HDFS?

This command is used to empty the trash available in an HDFS system. Syntax: $ hadoop fs –expunge.

Which HDFS command displays the last kilobyte of the file to stdout?

tail. Displays last kilobyte of the file to stdout. -f option can be used as in Unix.

What is FS in Hadoop command?

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others. The FS shell is invoked by: bin/hadoop fs <args>

Can we update a file in HDFS?

You can't UPDATE any existing record in HDFS, but yes, you can surely make another copy of the data (with the modifications/updates) in the HDFS and can remove the previous original copy.


1 Answers

This one worked for me:

hadoop fs -ls -R /tmp/app | awk -F" " '{print $6" "$7" "$8}' | sort -nr | head -1 | cut -d" " -f3

The output is the entire file path.

like image 189
Neethu Lalitha Avatar answered Sep 17 '22 17:09

Neethu Lalitha