I would like to know what will happen if a hive SELECT and INSERT OVERWRITE is running at the same time. Please help me to understand what will hive query return in the below scenarios.
Run the query first, while the query is running, INSERT OVERWRITE the same table.
Run the INSERT OVERWRITE first, while overwriting, pull the data from the same table with SELECT.
Are we going to get the old data, new data, mixed data, nothing, or unpredictable data?
I am using MapR 4.0.1, Hive 0.13.
Best regards,
Ryan
The INSERT OVERWRITE DIRECTORY with Hive format overwrites the existing data in the directory with the new values using Hive SerDe . Hive support must be enabled to use this command. The inserted rows can be specified by value expressions or result from a query.
Description. The INSERT OVERWRITE statement overwrites the existing data in the table using the new values. The inserted rows can be specified by value expressions or result from a query.
INSERT OVERWRITE is used to replace any existing data in the table or partition and insert with the new rows.
-- INSERT OVERWRITE will overwrite any existing data in the table or partition. unless IF NOT EXISTS is provided for a partition (as of Hive 0.9. 0). -- INSERT INTO will append to the table or partition, keeping the existing data intact.
Read Hive Locking:
For a non-partitioned table, the lock modes are pretty intuitive. When the table is being read, a S lock is acquired, whereas an X lock is acquired for all other operations (insert into the table, alter table of any kind etc.)
So SELECT and INSERT acquire incompatible locks so they can never run in parallel. One will acquire the lock first and the other will wait.
For partitioned tables things are a bit more complex as the locks acquire are hierarchical (S on table, S/X on partition). Read the link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With