CREATE EXTERNAL TABLE old_events
(day STRING, foo STRING, count STRING, internal_id STRING)
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '${INPUT}';
CREATE EXTERNAL TABLE events
(internal_id, foo STRING, count STRING)
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '${OUTPUT}';
INSERT OVERWRITE TABLE events
SELECT e2.internal_id, e2.foo, count(e1.foo)
FROM old_events e2
LEFT OUTER JOIN old_events e1
ON e1.foo = e2.foo
WHERE e1.event = 'event1'
AND e2.event = 'event2'
AND ds = date_sub('${DAY}',1)
GROUP BY e2.internal_id, e2.foo;
FAILED: Error in semantic analysis: Column ds Found in more than One Tables/Subqueries
I am getting this error, when adding ds variable that fetches current date. How do i implemented with the above script to work for date partition.
You need to add an alias to ds
in your WHERE
clause. ex, ds = date_sub('${DAY}',1)
to e2.ds = date_sub('${DAY}',1)
.
To clarify your problem a little bit, here is a smaller example which shows the same behavior
CREATE EXTERNAL TABLE example
(a INT, b INT)
LOCATION '${OUTPUT}';
SELECT *
FROM example e1
JOIN example e2
ON e1.a = e2.a
WHERE b = 5;
This creates the same error:
FAILED: SemanticException Column b Found in more than One Tables/Subqueries
The problem is that column b
exists in both example
aliased as e1
and e2
. You and I might know that if you join example
on itself on column a
then e1.b
is the same as e2.b
so it should not need an alias, but Hive does not know this so you need to pick one to remove any ambiguity. It doesn't matter here whether b
is a partition column or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With