Hive: Table creation with multi-files with multiple directories

Tags:

hive

I want to create a Hive table where the input textfiles are traversed onto multiple sub-directories in hdfs. So example I have in hdfs:

    /testdata/user/Jan/part-0001
    /testdata/user/Feb/part-0001
    /testdata/user/Mar/part-0001
and so on...

If i want to create a table user in hive, but have it be able to traverse the sub-directories of user, can that be done? I tried something like this, but doesn't work;

CREATE EXTERNAL TABLE users (id int, name string) 
STORED AS TEXTFILE LOCATION '/testdata/user/*'

I thought adding the wildcard would work but doesn't. When I tried not using wildcard still does not work. However, if I copy the files into the root directory of user, then it works. Is there no way for Hive to traverse to the child-directories, and grab those files?

226

asked Jan 27 '12 20:01

user706794

1 Answers

You can create an external table, then add subfolders as partitions.

CREATE EXTERNAL TABLE test (id BIGINT) PARTITIONED BY ( yymmdd STRING);
ALTER TABLE test ADD PARTITION (yymmdd = '20120921') LOCATION 'loc1';
ALTER TABLE test ADD PARTITION (yymmdd = '20120922') LOCATION 'loc2';

128

answered Sep 19 '22 05:09

Rufus

Related questions
                            
                                Why do we need Hadoop passwordless ssh?
                            
                                Computing median in map reduce
                            
                                Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/base/Preconditions
                            
                                Is Hadoop right for running my simulations?
                            
                                How can I calculate exact median with Apache Spark?
                            
                                How to start development for mahout
                            
                                How to choose between apache ranger and sentry
                            
                                how to use hadoop for a web application?
                            
                                Why does the Hadoop incompatible namespaceIDs issue happen?
                            
                                override log4j.properties in hadoop
                            
                                Hadoop: require root's password after enter "start-all.sh"
                            
                                Skipping the header while loading the text file using Piglatin
                            
                                copyFromLocal: `/user/hduser/gutenberg': No such file or directory
                            
                                HBase getting all timestamped values for a cell
                            
                                how to sort numerically in hadoop's shuffle/sort phase?
                            
                                Hadoop native libraries not found on OS/X
                            
                                Is there any Conditional IF like operator in Apache PIG?
                            
                                Python Connection to Hive
                            
                                How to read a .deflate file in hadoop
                            
                                Why we need Avro schema evolution

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With