Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed clause in hive

Tags:

hadoop

hive

Please explain me or provide a link about what distributed by really does in hive? How can it control a file to being send to a particular reducer??

like image 332
RAVITEJA SATYAVADA Avatar asked Dec 02 '22 21:12

RAVITEJA SATYAVADA


1 Answers

DISTRIBUTE BY controls how map output is divided among reducers. By default, MapReduce computes a hash on the keys output by mappers and tries to evenly distribute the key-value pairs among the available reducers using the hash values. Say we want the data for each value in a column to be captured together. We can use DISTRIBUTE BY to ensure that the records for each go to the same reducer. DISTRIBUTE BY works similar to GROUP BY in the sense that it controls how reducers receive rows for processing, Note that Hive requires that the DISTRIBUTE BY clause come before the SORT BY clause if it's in same query .

like image 68
Navneet Kumar Avatar answered Dec 28 '22 13:12

Navneet Kumar