Loop over files in HDFS directory

Question

I need to loop over all csv files in a Hadoop file system. I can list all of the files in a HDFS directory with

> hadoop fs -ls /path/to/directory
Found 2 items
drwxr-xr-x   - hadoop hadoop          2 2016-10-12 16:20 /path/to/directory/tmp
-rwxr-xr-x   3 hadoop hadoop 4691945927 2016-10-12 19:37 /path/to/directory/myfile.csv

and can loop over all files in a standard directory with

for filename in /path/to/another/directory/*.csv; do echo $filename; done

but how can I combine the two? I've tried

for filename in `hadoop fs -ls /path/to/directory | grep csv`; do echo $filename; done

but that gives me some nonsense like

Found
2
items
drwxr-xr-x

hadoop
hadoop
2    
2016-10-12
....

matesc · Accepted Answer

This should work

for filename in `hadoop fs -ls /path/to/directory | awk '{print $NF}' | grep .csv$ | tr '
' ' '`
do echo $filename; done

Loop over files in HDFS directory

Tags:

bash

hadoop

hdfs

Sal

1 Answers

matesc

Recent Activity

Donate For Us

Loop over files in HDFS directory

Tags:

bash

hadoop

hdfs

Sal

1 Answers

matesc

Related questions

Recent Activity

Donate For Us