New to hadoop, only setup a 3 debian server cluster for practice. I was researching best practices on hadoop and came across: JBOD no RAID Filesystem: ext3, ext4, xfs - none of that fancy COW stuff you see with zfs and btrfs So I raise these questions... <hr> Everywhere I read JBOD is better then RAID in hadoop, and that the best filesystems are xfs and ext3 and ext4. Aside from the filesystem stuff which totally makes sense why those are the best... how do you implement this JBOD? You will see my confusion if you do the google search your self, JBOD alludes to a linear appendage or combination of just a bunch of disks kind of like a logical volume well at least thats how some people explain it, but hadoop seems to want a JBOD that doesnt combine. No body expands on that... <ul> <li>Question 1) What does everyone in the hadoop world mean by JBOD and how do you implement that?</li> <li>Question 2) Is it as simple as mounting each disk to a different directory is all?</li> <li>Question 3) Does that mean that hadoop runs best on a JBOD where each disk is simply mounted to a different directory?</li> <li>Question 4) And then you just point hadoop at those data.dirs?</li> <li>Question5) I see JBODS going 2 ways, either each disk going to a seperate mount, or a linear concat of disks, which can be done mdadm --linear mode, or lvm i bet can do it too, so I dont see the big deal with that... And if thats the case, where mdadm --linear or lvm can be used because the JBOD people are refering to is this concat of disks, then which is the best way to "JBOD" or linearly concat disks for hadoop?</li> </ul> <hr> This is off topic, but can someone verify if this is correct as well? Filesystems that use cow, copy on write, like zfs and btrfs just slow down hadoop but not only that the cow implementation is a waste with hadoop. <ul> <li>Question 6) Why is COW and things like RAID a waste on hadoop? I see it as if your system crashes and you use the cowness of if to restore it, by the time you restored your system, there have been so many changes to hdfs it will probably just consider that machine as faulty and it would be better to rejoin it from scratch (bring it up as a fresh new datanode)... Or how will the hadoop system see the older datanode? My guess is it wont think its old or new or even a datanode, it will just see it as garbage... Idk...</li> <li>Question 7) What happens if hadoop sees a datanode that fell off the cluster and then the datanode comes back online with data slightly older? Is there an extent to how old the data has to be ??? how does this topic?</li> </ul> <hr> <h3>REASKING QUESTION 1 THRU 4</h3> <ul> <li>I just realized my question is so simple yet it's so hard for me to explain it that I had to split it up into 4 questions, and i still didn't get the answer I'm looking for, from what sounds like very smart individuals, so i must re-ask differently..</li> <li>On paper I could easily or with a drawing... I'll attempt with words again..</li> <li>If confused on what I am asking in the JBOD question...</li> <li>** just wondering what kind of JBOD everyone keeps referring to in the hadoop world is all ** </li> <li>JBODs are defined differently with hadoop then in normal world and I want to know how the best way to implement hadoop is it on a concat of jbods(sda+sdb+sdc+sdd) or just leave the disks alone(sda,sdb,sdc,sdd)</li> <li>I think the graphical representation below explain what I am asking best</li> </ul> <h3>(JBOD METHOD 1)</h3> <ul> <li>normal world: jbod is a concatination of disks - then if you were to use hadoop you would overlay the data.dir (Where hdfs virtualy sites) onto a directory inside this concat of disks, ALSO all of the disks would appear as 1... so if you had sda and sdb and sdc as your data disks in your node, you would make em appear as some entity1 (either with the hardware of the motherboard or mdadm or lvm) which is a linear concat of sda and sdb and sdc. you would then mount this entity1 to a folder in the Unix namespace like /mnt/jbod/ and then setup hadoop to run with in it.</li> <li>TEXT SUMMARY: if disk 1 and disk2 and disk 3 were each 100gb and 200gb and 300gb big respectively then this jbod would be 600gb big, and hadoop from this node would gain 600gb in capacity </li> </ul> <code>* TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD: * disk1 2 and 3 used for datanode for hadoop * disk1 is sda 100gb * disk2 is sdb 200gb * disk3 is sdc 300gb * sda + sdb + sdc = jbod of name entity1 * JBOD MADE ANYWAY - WHO CARES - THATS NOT MY QUESTION: maybe we made the jbod of entity1 with lvm, or mdadm using linear concat, or hardware jbod drivers which combine disks and show them to the operating system as entity1, it doesn't matter, either way its still a jbod * This is the type of JBOD I am used to and I keep coming across when I google search JBOD * cat /proc/partitions would show sda,sdb,sdc and entity1 OR if we used hardware jbod maybe sda and sdb and sdc would not show and only entity1 would show, again who cares how it shows * mount entity1 to /mnt/entity1 * running "df" would show that entity1 is 100+200+300=600gb big * we then setup hadoop to run its datanodes on /mnt/entity1 so that datadir property points at /mnt/entity1 and the cluster just gained 600gb of capacity</code> ..the other perspective is this.. <h3>(JBOD METHOD 2)</h3> <ul> <li>in hadoop it seems to me they want every disk seperate. So I would mount disk sda and sdb and sdc in the unix namespace to /mnt/a and /mnt/b and /mnt/c... it seems from reading across the web lots of hadoop experts classify jbods as just that just a bunch of disks so to unix they would look like disks not a concat of the disks... and then of course i can combine then to become one entity either with logical volume manager (lvm) or mdadm (in a raid or linear fashion, linear prefered for jbod) ...... but...... nah lets not combine them because it seems in the hadoop world jbod is just a bunch of disks sitting by them selves... </li> <li>if disk 1 and disk2 and disk 3 were each 100gb and 200gb and 300gb big respectively then each mount disk1->/mnt/a and disk2->/mnt/b and disk3->/mnt/c would each be 100gb and 200gb and 300gb big respectively, and hadoop from this node would gain 600gb in capacity</li> </ul> <code>TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD * disk1 2 and 3 used for datanode for hadoop * disk1 is sda 100gb * disk2 is sdb 200gb * disk3 is sdc 300gb * WE DO NOT COMBINE THEM TO APPEAR AS ONE * sda mounted to /mnt/a * sdb mounted to /mnt/b * sdc mounted to /mnt/c * running a "df" would show that sda and sdb and sdc have the following sizes: 100,200,300 gb respectively * we then setup hadoop via its config files to lay its hdfs on this node on the following "datadirs": /mnt/a and /mnt/b and /mnt/c.. gaining 100gb to the cluster from a, 200gb from b and 300gb from c... for a total gain of 600gb from this node... nobody using the cluster would tell the difference..</code> <h3>SUMMARY OF QUESTION</h3> ** Which method is everyone referring to is BEST PRACTICE for hadoop this combination jbod or a seperation of disks - which is still also a jbod according to online documentation? ** <ul> <li>Both cases would gain hadoop 600gb... its just 1. looks like a concat or one entity that is a combo of all the disks, which is what I always thought was a jbod... Or it will be like 2 where each disk in the system is mounted to different directory, end result is all the same to hadoop capacity wise... just wondering if this is the best way for performance</li> </ul>

I can try to answer few of the questions - tell me wherever you disagree. 1.JBOD: Just a bunch of disks; an array of drives, each of which is accessed directly as an independent drive. From Hadoop Definitive Guide, topic Why not use RAID?, says that RAID Read and Write performance is limited by the slowest disk in the Array. In addition, in case of HDFS, replication of data occurs across different machines residing in different racks. This handle potential loss of data even if a rack fails. So, RAID isn't that necessary. Namenode can though use RAID as mentioned in the link. 2.Yes That means independent disks (JBODs) mounted in each of the machines (e.g. /disk1, /disk2, /disk3 etc.) but not partitioned. 3, 4 & 5 Read Addendum 6 & 7. Check this link to see how replication of blocks occurs Addendum after the comment: Q1. Which method is everyone refering to is BEST PRACTICE for hadoop this combination jbod or a seperation of disks - which is still also a jbod according to online documentation? Possible answer: From Hadoop Definitive Guide - <blockquote> You should also set the dfs.data.dir property, which specifies a list of directories for a datanode to store its blocks. Unlike the namenode, which uses multiple directories for redundancy, a datanode round-robins writes between its storage directories, so for performance you should specify a storage directory for each local disk. Read performance also benefits from having multiple disks for storage, because blocks will be spread across them, and concurrent reads for distinct blocks will be correspondingly spread across disks. For maximum performance, you should mount storage disks with the noatime option. This setting means that last accessed time information is not written on file reads, which gives significant performance gains. </blockquote> Q2. Why LVM is not a good idea? <blockquote> Avoid RAID and LVM on TaskTracker and DataNode machines – it generally reduces performance. </blockquote> This is because LVM creates logical layer over the individual mounted disks in a machine. Check this link for TIP 1 more details. There are use cases where using LVM performed slow when running Hadoop jobs.

What kind of JBOD in hadoop? and COW with hadoop?

Tags:

hadoop

ext3

zfs

raid

New to hadoop, only setup a 3 debian server cluster for practice.

I was researching best practices on hadoop and came across: JBOD no RAID Filesystem: ext3, ext4, xfs - none of that fancy COW stuff you see with zfs and btrfs

So I raise these questions...

Everywhere I read JBOD is better then RAID in hadoop, and that the best filesystems are xfs and ext3 and ext4. Aside from the filesystem stuff which totally makes sense why those are the best... how do you implement this JBOD? You will see my confusion if you do the google search your self, JBOD alludes to a linear appendage or combination of just a bunch of disks kind of like a logical volume well at least thats how some people explain it, but hadoop seems to want a JBOD that doesnt combine. No body expands on that...

Question 1) What does everyone in the hadoop world mean by JBOD and how do you implement that?
Question 2) Is it as simple as mounting each disk to a different directory is all?
Question 3) Does that mean that hadoop runs best on a JBOD where each disk is simply mounted to a different directory?
Question 4) And then you just point hadoop at those data.dirs?
Question5) I see JBODS going 2 ways, either each disk going to a seperate mount, or a linear concat of disks, which can be done mdadm --linear mode, or lvm i bet can do it too, so I dont see the big deal with that... And if thats the case, where mdadm --linear or lvm can be used because the JBOD people are refering to is this concat of disks, then which is the best way to "JBOD" or linearly concat disks for hadoop?

This is off topic, but can someone verify if this is correct as well? Filesystems that use cow, copy on write, like zfs and btrfs just slow down hadoop but not only that the cow implementation is a waste with hadoop.

Question 6) Why is COW and things like RAID a waste on hadoop? I see it as if your system crashes and you use the cowness of if to restore it, by the time you restored your system, there have been so many changes to hdfs it will probably just consider that machine as faulty and it would be better to rejoin it from scratch (bring it up as a fresh new datanode)... Or how will the hadoop system see the older datanode? My guess is it wont think its old or new or even a datanode, it will just see it as garbage... Idk...
Question 7) What happens if hadoop sees a datanode that fell off the cluster and then the datanode comes back online with data slightly older? Is there an extent to how old the data has to be ??? how does this topic?

REASKING QUESTION 1 THRU 4

I just realized my question is so simple yet it's so hard for me to explain it that I had to split it up into 4 questions, and i still didn't get the answer I'm looking for, from what sounds like very smart individuals, so i must re-ask differently..
On paper I could easily or with a drawing... I'll attempt with words again..
If confused on what I am asking in the JBOD question...
** just wondering what kind of JBOD everyone keeps referring to in the hadoop world is all **
JBODs are defined differently with hadoop then in normal world and I want to know how the best way to implement hadoop is it on a concat of jbods(sda+sdb+sdc+sdd) or just leave the disks alone(sda,sdb,sdc,sdd)
I think the graphical representation below explain what I am asking best

(JBOD METHOD 1)

normal world: jbod is a concatination of disks - then if you were to use hadoop you would overlay the data.dir (Where hdfs virtualy sites) onto a directory inside this concat of disks, ALSO all of the disks would appear as 1... so if you had sda and sdb and sdc as your data disks in your node, you would make em appear as some entity1 (either with the hardware of the motherboard or mdadm or lvm) which is a linear concat of sda and sdb and sdc. you would then mount this entity1 to a folder in the Unix namespace like /mnt/jbod/ and then setup hadoop to run with in it.
TEXT SUMMARY: if disk 1 and disk2 and disk 3 were each 100gb and 200gb and 300gb big respectively then this jbod would be 600gb big, and hadoop from this node would gain 600gb in capacity

* TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD: * disk1 2 and 3 used for datanode for hadoop * disk1 is sda 100gb * disk2 is sdb 200gb * disk3 is sdc 300gb * sda + sdb + sdc = jbod of name entity1 * JBOD MADE ANYWAY - WHO CARES - THATS NOT MY QUESTION: maybe we made the jbod of entity1 with lvm, or mdadm using linear concat, or hardware jbod drivers which combine disks and show them to the operating system as entity1, it doesn't matter, either way its still a jbod * This is the type of JBOD I am used to and I keep coming across when I google search JBOD * cat /proc/partitions would show sda,sdb,sdc and entity1 OR if we used hardware jbod maybe sda and sdb and sdc would not show and only entity1 would show, again who cares how it shows * mount entity1 to /mnt/entity1 * running "df" would show that entity1 is 100+200+300=600gb big * we then setup hadoop to run its datanodes on /mnt/entity1 so that datadir property points at /mnt/entity1 and the cluster just gained 600gb of capacity

..the other perspective is this..

(JBOD METHOD 2)

in hadoop it seems to me they want every disk seperate. So I would mount disk sda and sdb and sdc in the unix namespace to /mnt/a and /mnt/b and /mnt/c... it seems from reading across the web lots of hadoop experts classify jbods as just that just a bunch of disks so to unix they would look like disks not a concat of the disks... and then of course i can combine then to become one entity either with logical volume manager (lvm) or mdadm (in a raid or linear fashion, linear prefered for jbod) ...... but...... nah lets not combine them because it seems in the hadoop world jbod is just a bunch of disks sitting by them selves...
if disk 1 and disk2 and disk 3 were each 100gb and 200gb and 300gb big respectively then each mount disk1->/mnt/a and disk2->/mnt/b and disk3->/mnt/c would each be 100gb and 200gb and 300gb big respectively, and hadoop from this node would gain 600gb in capacity

TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD * disk1 2 and 3 used for datanode for hadoop * disk1 is sda 100gb * disk2 is sdb 200gb * disk3 is sdc 300gb * WE DO NOT COMBINE THEM TO APPEAR AS ONE * sda mounted to /mnt/a * sdb mounted to /mnt/b * sdc mounted to /mnt/c * running a "df" would show that sda and sdb and sdc have the following sizes: 100,200,300 gb respectively * we then setup hadoop via its config files to lay its hdfs on this node on the following "datadirs": /mnt/a and /mnt/b and /mnt/c.. gaining 100gb to the cluster from a, 200gb from b and 300gb from c... for a total gain of 600gb from this node... nobody using the cluster would tell the difference..

SUMMARY OF QUESTION

** Which method is everyone referring to is BEST PRACTICE for hadoop this combination jbod or a seperation of disks - which is still also a jbod according to online documentation? **

Both cases would gain hadoop 600gb... its just 1. looks like a concat or one entity that is a combo of all the disks, which is what I always thought was a jbod... Or it will be like 2 where each disk in the system is mounted to different directory, end result is all the same to hadoop capacity wise... just wondering if this is the best way for performance

665

asked Jul 17 '13 08:07

user2580961

1 Answers

I can try to answer few of the questions - tell me wherever you disagree.

1.JBOD: Just a bunch of disks; an array of drives, each of which is accessed directly as an independent drive. From Hadoop Definitive Guide, topic Why not use RAID?, says that RAID Read and Write performance is limited by the slowest disk in the Array. In addition, in case of HDFS, replication of data occurs across different machines residing in different racks. This handle potential loss of data even if a rack fails. So, RAID isn't that necessary. Namenode can though use RAID as mentioned in the link.

2.Yes That means independent disks (JBODs) mounted in each of the machines (e.g. /disk1, /disk2, /disk3 etc.) but not partitioned.

3, 4 & 5 Read Addendum

6 & 7. Check this link to see how replication of blocks occurs

Addendum after the comment:

Q1. Which method is everyone refering to is BEST PRACTICE for hadoop this combination jbod or a seperation of disks - which is still also a jbod according to online documentation?

Possible answer: From Hadoop Definitive Guide -

You should also set the dfs.data.dir property, which specifies a list of directories for a datanode to store its blocks. Unlike the namenode, which uses multiple directories for redundancy, a datanode round-robins writes between its storage directories, so for performance you should specify a storage directory for each local disk. Read performance also benefits from having multiple disks for storage, because blocks will be spread across them, and concurrent reads for distinct blocks will be correspondingly spread across disks.

For maximum performance, you should mount storage disks with the noatime option. This setting means that last accessed time information is not written on file reads, which gives significant performance gains.

Q2. Why LVM is not a good idea?

Avoid RAID and LVM on TaskTracker and DataNode machines – it generally reduces performance.

This is because LVM creates logical layer over the individual mounted disks in a machine.

Check this link for TIP 1 more details. There are use cases where using LVM performed slow when running Hadoop jobs.

174

answered Oct 14 '22 06:10

SSaikia_JtheRocker

Related questions
                            
                                Specifying compression codec for a INSERT OVERWRITE SELECT in Hive
                            
                                How can I be sure that data is distributed evenly across the hadoop nodes?
                            
                                Log Structured Merge Tree in Hbase
                            
                                POC for Hadoop in real time scenario
                            
                                Specifying multiple filter criteria through Oozie command line
                            
                                Free Hadoop Cluster for Experiments [closed]
                            
                                Why we need to move external table to managed hive table?
                            
                                Differences between existing MapReduce and YARN (MRv2)
                            
                                spark on yarn; how to send metrics to graphite sink?
                            
                                Hadoop 2.x -- how to configure secondary namenode?
                            
                                query hive partitioned table over date/time range
                            
                                Kafka Memory requirement
                            
                                How to know the exact block size of a file on a Hadoop node?
                            
                                Hadoop HDFS - Difference between Missing replica and Under replicated blocks
                            
                                hdfs copy multiple files to same target directory
                            
                                Hadoop streaming job failure: Task process exit with nonzero status of 137
                            
                                finding mean using pig or hadoop
                            
                                Merging multiple sequence files into one sequencefile within Hadoop
                            
                                Hadoop and Amazon Web Services [closed]
                            
                                Map Reduce output to CSV or do I need Key Values?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With