NFS instead of HDFS

Question

HDFS is the heart of Hadoop, I get that. But what if I don't want to store my data on HDFS. Instead, I want to analyze and run Hadoop jobs on data stored on a remote server accessible via the NFS protocol? How do I do that?

For example, I want to run Teragen using the data on the NFS server like below:

hadoop jar hadoop-mapreduce-examples.jar teragen 1000000000 nfs://IP/some/path

I am just looking for ideas on how to do this and I do understand the repercussions of all this (HDFS vs NFS). So, while I appreciate anyone telling me that it's a bad idea, I still want to do it for some experiment that I am trying.

I can maybe code something to make this happen but any pointers where I need to start will be helpful and much appreciated. I also don't want to reinvent the wheel. So, if something like this already exists that I am unaware of, please do comment and let me know. Anything that I build will be made open-source so that others can benefit as well.

Bechi · Accepted Answer

Do you know this site: https://blog.netapp.com/blogs/run-big-data-analytics-natively-on-nfs-data/

It looks like you can exchange HDFS with NFS at the bottom, while at a higher abstraction layer everything works as before as MapReduce/YARN will take care of everything for you.

I can not tell anything about whether or not this works, as we are currently preparing to set up such a "native NFS hadoop". I will come back to you with more details in some months.

NFS instead of HDFS

Tags:

hadoop

nfs

hdfs

Testing123

1 Answers

Bechi

Recent Activity

Donate For Us

NFS instead of HDFS

Tags:

hadoop

nfs

hdfs

Testing123

1 Answers

Bechi

Related questions

Recent Activity

Donate For Us