Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Installing Hbase / Hadoop on EC2 cluster

I know that I can spin off a EC2 cluster with Hadoop installed (unless I am wrong about that). How about Hbase? Can I have the Hadoop and Hbase premade, ready to go? Or do I need to get my hands dirty. If it is not an option, what is the best option? Cloudera apparently has a package with both. Is that the way to go?

Thanks for the help.

like image 875
delmet Avatar asked Feb 25 '11 03:02

delmet


People also ask

Is Hadoop required for HBase?

HBase can be used without Hadoop. Running HBase in standalone mode will use the local file system. Hadoop is just a distributed file system with redundancy and the ability to scale to very large sizes.

Is Amazon EMR free?

Amazon EMR pricing is simple and predictable: you pay a per-second rate for every second you use, with a one-minute minimum. A 10-node cluster running for 10 hours costs the same as a 100-node cluster running for one hour. Amazon EMR pricing depends on how you deploy your EMR applications. .

What is HBase AWS?

Apache HBase is an open-source, NoSQL, distributed big data store. It enables random, strictly consistent, real-time access to petabytes of data. HBase is very effective for handling large, sparse datasets.


1 Answers

hbase has a set of ec2 scripts which get you setup and ready to go very quickly. It lets you configure the number of zk servers as well as slave nodes, but I'm not sure in which versions they are available. I'm using 0.20.6. After setting up some of your S3/EC2 information, you can do things like:

/usr/local/hbase-0.20.6/contrib/ec2/bin/launch-hbase-cluster CLUSTERNAME SLAVES ZKSERVERS

to quickly start using the cluster. It's nice because it'll install LZO information for you, as well.

Here are some params from the environment file in the bin directory that might be useful (if you want a 20.6 AMI):

# The version of HBase to use.
HBASE_VERSION=0.20.6

# The version of Hadoop to use.
HADOOP_VERSION=0.20.2

# The Amazon S3 bucket where the HBase AMI is stored.
# Change this value only if you are creating your own (private) AMI
# so you can store it in a bucket you own.
#S3_BUCKET=apache-hbase-images
S3_BUCKET=720040977164

# Enable public access web interfaces
ENABLE_WEB_PORTS=false

# Extra packages
# Allows you to add a private Yum repo and pull packages from it as your
# instances boot up. Format is <repo-descriptor-URL> <pkg1> ... <pkgN>
# The repository descriptor will be fetched into /etc/yum/repos.d.
EXTRA_PACKAGES=

# Use only c1.xlarge unless you know what you are doing
MASTER_INSTANCE_TYPE=${MASTER_INSTANCE_TYPE:-c1.xlarge}

# Use only c1.xlarge unless you know what you are doing
SLAVE_INSTANCE_TYPE=${SLAVE_INSTANCE_TYPE:-c1.xlarge}

# Use only c1.medium unless you know what you are doing
ZOO_INSTANCE_TYPE=${ZOO_INSTANCE_TYPE:-c1.medium}

You also might need to set your java version if JAVA_HOME is not set in the ami (and I don't think it is). Newer versions of hbase are probably available in S3 buckets, just do a describe instances and grep for hadoop/hbase to narrow the results.

like image 173
Mike Avatar answered Oct 13 '22 00:10

Mike