Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Hive slow queries

I am new to Hadoop Hive and I am developing a reporting solution. The problem is that the query performance is really slow (hive 0.10, hbase 0.94, hadoop 1.1.1). One of the queries is:

select a.*, b.country, b.city from p_country_town_hotel b 
    inner join p_hotel_rev_agg_period a  on
    (a.key.hotel = b.hotel) where b.hotel = 'AdriaPraha' and a.min_date < '20130701'
    order by a.min_date desc  
    limit 10;

which takes quite a long time (50s). I know I know, the join is on string field and not on integer but the data sets are not big(cca 3300 and 100000 records). I tried hints on this SQL but that didn't turn out any faster. The same query on MS SQL Server lasts 1s. Also a simple count(*) from table lasts 7-8s which is shocking (the table has 3300 records). I really don't know what is the issue? Any ideas or did I misinterpret Hadoop?

like image 305
user2346868 Avatar asked May 03 '13 12:05

user2346868


2 Answers

Yes..you have misinterpreted Hadoop. Hadoop, and Hive as well, are not meant for real time stuff. They are most suitable for offline, batch processing kinda stuff. They are not at all a replacement to RDBMSs. Though you can do some fine tuning but 'absolute real time' is not possible. There a lot of things which happen under the hood when you run a hive query, which I think you are not unaware of. First of all you Hive query gets converted into a corresponding MR job followed by few other things like split creation, records generation, mapper generation etc. I would never suggest Hadoop(or Hive) if you have real time needs.

You might wanna have a look at Impala for your real time needs.

like image 70
Tariq Avatar answered Oct 09 '22 10:10

Tariq


Hive is not the appropriate tool for a real-time job, but if you want to leverage the Hadoop infrastructure with real-time or fast data access take a look at HBase. It's value-add is all about fast access. Not sure why you are selecting Hadoop for your solution, but Hbase sits on top of HDFS which some people like because of the inherent redundancy HDFS offers (you copy a file on there once and it is auto-replicated) which may be one of the reasons you are looking into Hadoop.

For more info: read this question

like image 29
Engineiro Avatar answered Oct 09 '22 12:10

Engineiro