Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can in insert data multiple times into a bucketed hive table

I have a bucketed hive table. It has 4 buckets.

CREATE TABLE user(user_id BIGINT, firstname STRING, lastname STRING)
COMMENT 'A bucketed copy of user_info'
CLUSTERED BY(user_id) INTO 4 BUCKETS;

Initially i have inserted some records into this table using the following query.

set hive.enforce.bucketing = true;
insert into user
select * from second_user;

After this operation In HDFS I see that 4 files are created under this table dir.

Again i needed to insert another set of data into user table. So i ran the below query.

set hive.enforce.bucketing = true;
insert into user
select * from third_user;

Now another 4 files are crated under user folder dir. Now it has total 8 files.

Is this fine to do this kind of multiple inserts into a bucketed table? Does it affect the bucketing of the table?

like image 316
sunil Avatar asked Nov 01 '22 09:11

sunil


1 Answers

I figured it out!! Actually if you do multiple inserts on a bucketed hive table. Hive wont complain as such. All hive queries will work fine.

Having said that, Such operation spoils the bucketing concept of the table. I mean after multiple inserts into a bucketed table the sampling fails.

The TABLASAMPLE doesnt work properly after multiple inserts.

Even sort merge bucket map join also doesnt work after such operation.

like image 77
sunil Avatar answered Nov 12 '22 23:11

sunil