I understand that when the hive table has clustered by on one column, then it performs a hash function of that bucketed column and then puts that row of data into one of the buckets. And there is a file for each bucket i.e. if there are 32 buckets then there are 32 files in hdfs. What does it mean to have the clustered by on more than one column? For example, lets say that the table has CLUSTERED BY (continent, country) INTO 32 BUCKETS. How would the hash function be performed if there are more than one column? How many files would be generated? Is this still 32?

<ol> <li>Yes the number of files will still be 32. </li> <li>Hash function will operate by considering "continent,country" as a single string and then will use this as input.</li> </ol> Hope it helps!!

Hive clustered by on more than one column

Tags:

hadoop

hive

buckets

I understand that when the hive table has clustered by on one column, then it performs a hash function of that bucketed column and then puts that row of data into one of the buckets. And there is a file for each bucket i.e. if there are 32 buckets then there are 32 files in hdfs.

What does it mean to have the clustered by on more than one column? For example, lets say that the table has CLUSTERED BY (continent, country) INTO 32 BUCKETS.

How would the hash function be performed if there are more than one column?

How many files would be generated? Is this still 32?

663

asked Jun 16 '15 15:06

Manikandan Kannan

1 Answers

Yes the number of files will still be 32.
Hash function will operate by considering "continent,country" as a single string and then will use this as input.

Hope it helps!!

140

answered Sep 22 '22 10:09

Maddy RS

Related questions
                            
                                How to read hadoop sequential file?
                            
                                Hbase: How to specify hostname for Hbase master
                            
                                Hadoop configuration: mapred.* vs mapreduce.*
                            
                                Hive QL - Limiting number of rows per each item
                            
                                100 TB of data on Mongo DB? Possible?
                            
                                Not able to apply dynamic partitioning for a huge data set in Hive
                            
                                Using Hadoop through a SOCKS proxy?
                            
                                Using Hive ntile results in where clause
                            
                                Hive: Is there a better way to percentile rank a column?
                            
                                YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register
                            
                                spark on yarn, Connecting to ResourceManager at /0.0.0.0:8032
                            
                                Spark job reading from S3 on Spark cluster gives IllegalAccessError: tried to access method MutableCounterLong [duplicate]
                            
                                How to write TIMESTAMP logical type (INT96) to parquet, using ParquetWriter?
                            
                                What is the difference between Driver and Application manager in spark
                            
                                Advanced queries in HBase
                            
                                Setting fs.default.name in core-site.xml Sets HDFS to Safemode
                            
                                Can't run a MapReduce job on hadoop 2.4.0
                            
                                Spark can no longer execute jobs. Executors fail to create directory
                            
                                Hive Runtime Error while processing row in Hive
                            
                                How to flatMap a function on GroupedDataSet in Apache Flink

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With