Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to have multiple hive tables represented within the same HDFS directory structure?

Tags:

hadoop

hive

hdfs

Is it possible to have multiple hive tables represented within the same HDFS directory structure? In other words, is there a way to have multiple hive tables pointing to same/overlapping HDFS paths?

Here is my situation:

I have a table named "mytable", located in hdfs:/tables/mytable

CREATE EXTERNAL TABLE mytable
(
id int,
...
[a whole bunch of columns]
...
PARTITIONED BY (logname STRING)
STORED AS [I-do-not-know-what-just-yet]
LOCATION 'hdfs:/tables/mytable';

So, HDFS will look like:

hdfs:/tables/mytable/logname=tarzan/....
hdfs:/tables/mytable/logname=jane/....
hdfs:/tables/mytable/logname=whoa/....

Is it possible to have a hive table, named "tarzan", located in hdfs:/tables/mytable/logname=tarzan ? Same with hive table "jane", located in hdfs:/tables/mytable/logname=jane, etc.

The tarzan, jane, whoa, etc sub-tables share some columns (timestamp, ip_address, country, user_id, and some others), but there will also be a lot of columns that they do not have in common.

Is there a way to store this data once in HDFS, and use it for multiple tables as I described above? Furthermore, is there a way to store the data in an efficient way, since many of the tables will have columns that are not in common? Would a file format like RCFILE or PARQUET work in this case?

Thanks so much for any hints or help anyone can provide,

like image 867
david1977 Avatar asked Nov 15 '25 13:11

david1977


1 Answers

Yes, we can have multiple hive tables with the same underlying HDFS directory.

Example:

  1. Create table emp and load data file file3 into it.

    create table emp (id int, name string, salary int) 
    row format delimited 
    fields terminated by ','  
    -- default location would be used
    
    load data 
    local inpath '/home/parv/testfiles/file3' 
    into table emp; 
    
  2. Create another table mirror. When you will select data from mirror table, it would be as same as of emp table (contents of file3).

    create table mirror (id int, name string, salary int) 
    row format delimited 
    fields terminated by ',' 
    location 'hdfs:///user/hive/warehouse/parv.db/base';
    
  3. Load data into mirror table. When you will select data either from mirror table or emp table, it would return same results (contents of file3 and file4).

    load data 
    local inpath '/home/parv/testfiles/file4' 
    into table mirror;  
    

Conclusion:

  1. Same data files are shared among both tables emp and mirror.
  2. But, strange, the HDFS filesystem only shows data directory for emp table and not for mirror table. However, both the tables are present in hive and so can be queried.
like image 98
Parvinder Singh Avatar answered Nov 17 '25 08:11

Parvinder Singh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!