Provided a partitioned fs structure like the following:
logs
└── log_type
└── 2013
├── 07
│ ├── 28
│ │ ├── host1
│ │ │ └── log_file_1.csv
│ │ └── host2
│ │ ├── log_file_1.csv
│ │ └── log_file_2.csv
│ └── 29
│ ├── host1
│ │ └── log_file_1.csv
│ └── host2
│ └── log_file_1.csv
└── 08
I've been trying to create an external table in Impala:
create external table log_type (
field1 string,
field2 string,
...
)
row format delimited fields terminated by '|' location '/logs/log_type/2013/08';
I wish Impala would recurse into the subdirs and load all the csv files; but no cigar. No errors are thrown but no data is loaded into the table.
Different globs like /logs/log_type/2013/08/*/*
or /logs/log_type/2013/08/*/*/*
did not work either.
Is there a way to do this? Or should I restructure the fs - any advice on that?
in case you are still searching for an answer. You need to register each individual partition manually.
See here for details Registering External Table
Your schema for the table needs to be adjusted
create external table log_type (
field1 string,
field2 string,
...)
partitioned by (year int, month int, day int, host string)
row format delimited fields terminated by '|';
After you changed your schema, to include year, month, day and host, you recursively have to add each partition to the table.
Something like this
ALTER TABLE log_type ADD PARTITION (year=2013, month=07, day=28, host="host1")
LOCATION '/logs/log_type/2013/07/28/host1';
Afterwards you need to refresh the table in impala.
invalidate log_type;
refresh log_type;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With