Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating Impala external table from a partitioned file structure

Provided a partitioned fs structure like the following:

logs
└── log_type
    └── 2013
        ├── 07
        │   ├── 28
        │   │   ├── host1
        │   │   │   └── log_file_1.csv
        │   │   └── host2
        │   │       ├── log_file_1.csv
        │   │       └── log_file_2.csv
        │   └── 29
        │       ├── host1
        │       │   └── log_file_1.csv
        │       └── host2
        │           └── log_file_1.csv
        └── 08

I've been trying to create an external table in Impala:

create external table log_type (
    field1    string,
    field2    string,
    ...
)
row format delimited fields terminated by '|' location '/logs/log_type/2013/08';

I wish Impala would recurse into the subdirs and load all the csv files; but no cigar. No errors are thrown but no data is loaded into the table.

Different globs like /logs/log_type/2013/08/*/* or /logs/log_type/2013/08/*/*/* did not work either.

Is there a way to do this? Or should I restructure the fs - any advice on that?

like image 565
p7k Avatar asked Jan 12 '23 17:01

p7k


1 Answers

in case you are still searching for an answer. You need to register each individual partition manually.

See here for details Registering External Table

Your schema for the table needs to be adjusted

create external table log_type (
        field1    string,
        field2    string,
...)
  partitioned by (year int, month int, day int, host string)
  row format delimited fields terminated by '|';

After you changed your schema, to include year, month, day and host, you recursively have to add each partition to the table.

Something like this

ALTER TABLE log_type ADD PARTITION (year=2013, month=07, day=28, host="host1")
    LOCATION '/logs/log_type/2013/07/28/host1';

Afterwards you need to refresh the table in impala.

invalidate log_type;
refresh log_type;
like image 109
Joey Avatar answered Apr 30 '23 00:04

Joey