Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pl/SQL Splitting a table

I have a table in Oracle. I want to make several tables, which each contains 1/10 data of this table (actually I need only one column). I was able to write the following code, but it doesn't seem to be efficient as it runs through the whole origin table each time.

declare
  baseObjid  Number := 100;
  chunkSize  Number;
  totalCount Number;
begin
  select count(1) into totalCount from table_person;
  chunkSize := trunc(totalCount / 10) + 1;
  for i in 1 .. 10 loop
    execute immediate 'create table table_person_' || i ||
                      ' AS (select sel.r + ' || baseObjid ||
                      ' objid,  sel.objid oldId from 
                      (select rownum r, objid from table_person order by objid) sel 
                      where sel.r > ' || (i - 1) * chunkSize || 
                      ' and sel.r <= ' || i * chunkSize || ')';
    commit;
  end loop;
end;

Is there a way to make it access the original table only once? Any other advice is welcome too.

like image 792
Letanir Avatar asked Feb 17 '26 12:02

Letanir


1 Answers

The simplest way of populating multiple tables from a single query is the INSERT ALL statement.

This query splits the IDs of PERSON across ten pre-created target tables PERSON_1 .. PERSON_10 by applying a modulus to rownum in the driving select. You can amend the projection of the query to give a different criterion.

insert all
    when rn = 1 then into person_1 (id) values (id) 
    when rn = 2 then into person_2 (id) values (id) 
    when rn = 3 then into person_3 (id) values (id) 
    when rn = 4 then into person_4 (id) values (id) 
    when rn = 5 then into person_5 (id) values (id) 
    when rn = 6 then into person_6 (id) values (id) 
    when rn = 7 then into person_7 (id) values (id) 
    when rn = 8 then into person_8 (id) values (id) 
    when rn = 9 then into person_9 (id) values (id) 
    else into person_10 values (id) 
select id, mod(rownum,10) as rn from person;

Alternatively you can drive the allocation by changing the WHEN criteria, for example into buckets:

insert all
    when id < 1000 then into person_1 (id) values (id) 
    when id < 2000 then into person_2 (id) values (id) 
    when id < 3000 then into person_3 (id) values (id) 
    when id < 4000 then into person_4 (id) values (id) 
    when id < 5000 then into person_5 (id) values (id) 
    when id < 6000 then into person_6 (id) values (id) 
    when id < 7000 then into person_7 (id) values (id) 
    when id < 8000 then into person_8 (id) values (id) 
    when id < 9000 then into person_9 (id) values (id) 
    else into person_10 values (id) 
select id from person; 

Obviously, the buckets could depend on a rownum column as in your posted code.


It's worthwhile considering the benefits of separating table creation from data population.

In your posted code, if the routine fails half way through (say the tablespace cannot be extended) you have some tables populated and accessible (because DDL issues commits, so there is no rollback). Once you have fixed the source of the error you would have to clean-up (i.e. drop) the those tables before re-running your routine. Or you could change the code and skip those tables, but that's always problematic, not least because there may be changes to the source table between runs, so you could end up with an inconsistent state.

Creating empty tables first provides a recoverable position. The INSERT ALL is a single statement which is not only more performative than ten selects but it means either all the tables are populated or none are.

like image 108
APC Avatar answered Feb 20 '26 03:02

APC