Is there a project which helps setup a Hadoop cluster on office desktops, when they are idle?
I'd like to experiment with Hadoop/MR/hbase but don't have acces to 5-10 computers. The computers at work are idle after hours and are connected to each other through a very high speed connection. What's more, data on these computers stays within our network so there is no privacy issue.
In order for this to work I need a fairly light weight monitor running on each machine. When the computer has been idle for X hours, it will join the cluster. If the user logs on, it has to drop out of the cluster and return all CPU/memory back.
Does something like this exist?
Cloud computingCompanies often choose to run Hadoop clusters on public, private, or hybrid cloud resources versus on-premises hardware to gain flexibility, availability, and cost control. Many cloud solution providers offer fully managed services for Hadoop, such as Dataproc from Google Cloud.
Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.
In HDFS, replication data is done to solve the problem of data loss in unfavorable conditions like crashing of the node, hardware failure and so on. Scalability – HDFS stores data on multiple nodes in the cluster, when requirement increases we can scale the cluster.
Apache Hadoop is an open source, Java-based software platform that manages data processing and storage for big data applications. The platform works by distributing Hadoop big data and analytics jobs across nodes in a computing cluster, breaking them down into smaller workloads that can be run in parallel.
You can use task scheduler to detect idle state and then start/stop a hadoop vm with virtual box or vmplayer. Or you can write a powershell script that does start stop based on resource usage.
Hadoop is not a computation grid it is a more a data grid (see slide 9 in this presentation). The point is that with hadoop that data is spread over the cluster and thus the data has to be stored on the computers. The time it would take to copy the data over/remove it when they're not idle would probably not be worth it - you'd be better off using hadoop in the cloud (amazon,Azure etc.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With