Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hdfs dfs command is slow - is there a way to make it faster?

Tags:

hadoop

hdfs

I am on Hadoop 2.2.0, running a Single Node setup.

My understanding is that hdfs dfs -ls is slow because it is spinning up a JVM every time it is invoked.

Is there any way to make it keep the JVM running so simple commands can complete faster?

like image 579
merlin2011 Avatar asked Jan 19 '14 23:01

merlin2011


Video Answer


1 Answers

I would like to inform you about a solution we did to solve this problem.

We created a new utility - HDFS Shell to work with HDFS more faster.

https://github.com/avast/hdfs-shell

  • HDFS DFS initiates JVM for each command call, HDFS Shell does it only once - which means great speed enhancement when you need to work with HDFS more often
  • Commands can be used in short way - eg. hdfs dfs -ls /, ls / - both will work
  • HDFS path completion using TAB key
  • we can easily add any other HDFS manipulation function
  • there is a command history persisting in history log (~/.hdfs-shell/hdfs-shell.log)
  • support for relative directory + commands cd and pwd
  • and much more...
like image 180
Vity Avatar answered Sep 20 '22 20:09

Vity