When i am running queries in <code>VirtualBox Sandbox</code> with hive. I feel <code>Select count(*)</code> is too much slower than the <code>Select *</code>. Can anyone explain what is going on behind? And why this delay is happening?

<pre class="prettyprint"><code>select * from table </code></pre> It can be a Map only job But <pre class="prettyprint"><code>Select Count(*) from table </code></pre> It can be a Map and Reduce job Hope this helps.

There are three types of operations that a hive query can perform. In order of cheapest and fastest to more expensive and slower here they are. A hive query can be a metadata only request. Show tables, describe table are examples. In these queries the hive process performs a lookup in the metadata server. The metadata server is a SQL database, probably MySQL, but the actual DB is configurable. A hive query can be an hdfs get request. Select * from table, would be an example. In this case hive can return the results by performing an hdfs operation. hadoop fs -get, more or less. A hive query can be a Map Reduce job. Hive has to ship the jar to hdfs, the jobtracker queues the tasks, the tasktracker execute the tasks, the final data is put into hdfs or shipped to the client. The Map Reduce job has different possibilities as well. It can be a Map only job. Select * from table where id > 100 , for example all of that logic can be applied on the mapper. It can be a Map and Reduce job, Select min(id) from table; Select * from table order by id ; It can also lead to multiple map Reduce passes, but I think the above summarizes some behaviors.

Why is Select Count() slower than Select in hive

2 Answers

select * from table

It can be a Map only job But

Select Count(*) from table

It can be a Map and Reduce job

Hope this helps.

157

answered Sep 19 '22 04:09

Mask

There are three types of operations that a hive query can perform.

In order of cheapest and fastest to more expensive and slower here they are.

A hive query can be a metadata only request.

Show tables, describe table are examples. In these queries the hive process performs a lookup in the metadata server. The metadata server is a SQL database, probably MySQL, but the actual DB is configurable.

A hive query can be an hdfs get request. Select * from table, would be an example. In this case hive can return the results by performing an hdfs operation. hadoop fs -get, more or less.

A hive query can be a Map Reduce job.

Hive has to ship the jar to hdfs, the jobtracker queues the tasks, the tasktracker execute the tasks, the final data is put into hdfs or shipped to the client.

The Map Reduce job has different possibilities as well.

It can be a Map only job. Select * from table where id > 100 , for example all of that logic can be applied on the mapper.

It can be a Map and Reduce job, Select min(id) from table; Select * from table order by id ;

It can also lead to multiple map Reduce passes, but I think the above summarizes some behaviors.

answered Sep 22 '22 04:09

Pearl90

Related questions
                            
                                Create a hard link from a file handle on Unix?
                            
                                Django filter() on field of related model
                            
                                How to track user location in background?
                            
                                Prevent compiler/cpu instruction reordering c#
                            
                                Rails 4: schema.db shows "Could not dump table "events" because of following NoMethodError# undefined method `[]' for nil:NilClass"
                            
                                multiprocessing pool.map call functions in certain order
                            
                                Group All Related Records in Many to Many Relationship, SQL graph connected components
                            
                                Status expected:<200> but was:<404> in spring test
                            
                                Loading X509Certificate results in exception CryptographicException "Cannot find the original signer"
                            
                                Add a view on top of all the Activities
                            
                                How to display Only spaces (...) without the ¶ in Netbeans "Show Non-printable Characters" mode?
                            
                                using cm in responsive media queries?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is Select Count() slower than Select in hive

Tags:

Haris N I

People also ask

2 Answers

Mask

Pearl90

Recent Activity

Donate For Us

Why is Select Count(*) slower than Select * in hive

Tags:

Haris N I

People also ask

2 Answers

Mask

Pearl90

Related questions

Recent Activity

Donate For Us

Why is Select Count() slower than Select in hive