Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed]

I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting stumped by the HDFS section (Module 2) and think it might be easier if I had a Windows specific tutorial. I tried following this one, but some of the steps weren't quite right. I've been trying to find a good tutorial that will work for me on my Windows 7 machine, but am a bit stuck. Is there a good place to go for this? Hadoop seems to be very geared toward Linux users, and unfortunately I have to use my work laptop, which is Windows 7. Can I make this work or does it really only work for Linux users?

like image 239
Steph Avatar asked Feb 02 '23 14:02

Steph


1 Answers

The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.

I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.

The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2

This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.

To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun

Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.

like image 68
Allen Avatar answered Feb 05 '23 04:02

Allen