Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Impala on Hadoop 2.2.0 without CDH?

I want to test and configure Impala with my Hadoop 2.2.0 distribution, not Cloudera ones.

I want to know if its possible to use Impala without CDH, because I only read that Impala is CDH dependent.

I'm trying to follow the guide in Impala Github - https://github.com/cloudera/impala - and I'll do the possible changes to make it work.

Does anyone already done that? or is it really impossible?

like image 410
BAndrade Avatar asked Dec 24 '13 12:12

BAndrade


People also ask

How does Impala work in Hadoop?

Impala uses the distributed filesystem HDFS as its primary data storage medium. Impala relies on the redundancy provided by HDFS to guard against hardware or network outages on individual nodes. Impala table data is physically represented as data files in HDFS, using familiar HDFS file formats and compression codecs.

Can Impala replace Hive?

Impala does not replace the batch processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are best suited for long running batch jobs, such as those involving batch processing of Extract, Transform, and Load (ETL) type jobs.

Which is faster Hive or Impala?

Impala is faster than Hive because it's a whole different engine and Hive is over MapReduce (which is very slow due to its too many disk I/O operations).

Is Impala an Hadoop?

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.


1 Answers

I think there are two things here that should be addressed separately:

  1. Running Impala on non-CDH Hadoop. It is possible, though it is not tested or supported by Cloudera. However, other Hadoop distributions include Impala, e.g. MapR's distribution includes Cloudera Impala and Amazon announced support for Impala on Elastic MapReduce, and they have both tested that it works with their distributions. I assume you're not using MapR, either, but my point is just that it is possible.
  2. Running Impala on Hadoop 2.2.0. This is also possible as the CDH5 beta 1 release includes Hadoop 2.2.0, so Impala versions 1.2 and higher should work. Please do make sure you use the latest version (1.2.3 as of now) because there are a number of important fixes in the last few minor dot releases.

So yeah, it's possible, though it probably won't be a smooth installation and there isn't a lot of help for this use case. Good luck!

like image 151
Matt Avatar answered Nov 01 '22 16:11

Matt