How can we decide the total no. of buckets for a hive table

Tags:

i am bit new to hadoop. As per my knowledge buckets are fixed no. of partitions in hive table and hive uses the no. of reducers same as the total no. of buckets defined while creating the table. So can anyone tell me how to calculate the total no. of buckets in a hive table. Is there any formula for calculating the total number of buckets ?

256

asked Jun 09 '15 11:06

Biswa Bandana Nayak

2 Answers

Lets take a scenario Where table size is: 2300 MB, HDFS Block Size: 128 MB

Now, Divide 2300/128=17.96

Now, remember number of bucket will always be in the power of 2.

So we need to find n such that 2^n > 17.96

n=5

So, I am going to use number of buckets as 2^5=32

Hope, It will help some of you.

100

answered Oct 24 '22 14:10

puja

From the documentation link

In general, the bucket number is determined by the expression hash_function(bucketing_column) mod num_buckets. (There's a '0x7FFFFFFF in there too, but that's not that important). The hash_function depends on the type of the bucketing column. For an int, it's easy, hash_int(i) == i. For example, if user_id were an int, and there were 10 buckets, we would expect all user_id's that end in 0 to be in bucket 1, all user_id's that end in a 1 to be in bucket 2, etc. For other datatypes, it's a little tricky. In particular, the hash of a BIGINT is not the same as the BIGINT. And the hash of a string or a complex datatype will be some number that's derived from the value, but not anything humanly-recognizable. For example, if user_id were a STRING, then the user_id's in bucket 1 would probably not end in 0. In general, distributing rows based on the hash will give you a even distribution in the buckets.

answered Oct 24 '22 13:10

Ramzy

Related questions
                            
                                Count occurrences of a list of substrings in a pyspark df column
                            
                                How to configure Spark 2.4 correctly with user-provided Hadoop
                            
                                For Hive partition based on date, why use string type? why not int?
                            
                                Need to add auto increment column in a table using hive
                            
                                To create column with date datatype in hive table
                            
                                How to change sqoop metastore?
                            
                                Loading xml data into hive table :org.apache.hadoop.hive.ql.metadata.HiveException
                            
                                storing pig output into Hive table in a single instance
                            
                                Spring-boot-application with Hive-connection doesn't start embedded Tomcat
                            
                                ParseException in Hive
                            
                                HiveQL: Using query results as variables
                            
                                save dataframe as external hive table
                            
                                Relationship between HDFS, HBase, Pig, Hive and Azkaban?
                            
                                Hive DISTINCT() for all columns?
                            
                                Hive permission denied for user anonymous using beeline shell
                            
                                How to truncate a partitioned external table in hive?
                            
                                "Parquet record is malformed" while column count is not 0
                            
                                How to tackle a BIG DATA Data Mart / Fact Table? ( 240 millions of rows )
                            
                                how to make hive take only specific files as input from hdfs folder
                            
                                Hadoop HIVE - How to query part of rows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can we decide the total no. of buckets for a hive table

Tags:

numbers

formula

hive

buckets

Biswa Bandana Nayak

People also ask

2 Answers

puja

Ramzy

Recent Activity

Donate For Us