Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Locking a directory in HDFS

Tags:

hadoop

hdfs

Is there a way to acquire lock on a directory in HDFS? Here's what I am trying to do:

I've a directory called ../latest/...

Every day I need to add fresh data into this directory, but before I copy new data in here, I want to acquire lock so no one is using it while I copy new data into it.

Is there a way to do this in HDFS?

like image 985
DilTeam Avatar asked Feb 19 '14 00:02

DilTeam


1 Answers

No, there is no way to do this through HDFS.

In general, when I have this problem, I try to copy the data into a random temp location and then move the file once the copy is complete. This is nice because mv is pretty instantaneous, while copying takes longer. That way, if you check to see if anyone else is writing and then mv, the time period and "lock" is held for a shorter time

  1. Generate a random number
  2. Put the data into a new folder in hdfs://tmp/$randomnumber
  3. Check to see if the destination is OK (hadoop fs -ls perhaps)
  4. hadoop fs -mv the data to the latest directory.

There is a slim chance that between 3 and 4 you might have someone clobber something. If that really makes you nervous, perhaps you can implement a simple lock in ZooKeeper. Curator can help you with that.

like image 74
Donald Miner Avatar answered Oct 04 '22 12:10

Donald Miner