Getting counts/totals at each level of a hierarchical query using CONNECT BY

Tags:

I am having a heck of a time with this. I am trying to write a query (using Oracle), against a table with a recursive relationship (hierarchical) and get the total number of records stored in another table at and below each node in the tree. The other table only has records associated with the leaf nodes. However, I want to get totals at and below each node in the tree. For example, say I have two tables. DIRS contains the directory names and a recursive relationship identifying the structure of the directories, and FILES contains file information with a foreign key to DIRS indicating the directory the file resides in:

DIRS
====
DIR_ID 
PARENT_DIR_ID
DIR_NAME

FILES
=====
FILE_ID
FILE_NAME
DIR_ID
FILE_SIZE

If DIRS contains:

DIR_ID   PARENT_DIR_ID   DIR_NAME
======   =============   ========
1                        ROOT
2        1               DIR1_1
3        1               DIR1_2
4        2               DIR2_1
5        2               DIR2_2

and FILES contains

FILE_ID   FILE_NAME   DIR_ID   FILE_SIZE
=======   =========   ======   =========
1         test1.txt   5        100
2         test2.txt   5        200
3         test5.txt   5         50 
4         test3.txt   3        300
5         test4.txt   3        300
6         test6.txt   4        100

I want a query that returns the path along with the number of files in or below each node in the hierarchy. Basically a rollup of the number of files. So the query result would look something like:

Path                    File_Count
=====                   ===========
/ROOT                   6
/ROOT/DIR1_1            4
/ROOT/DIR1_1/DIR2_1     1
/ROOT/DIR1_1/DIR2_2     3
/ROOT/DIR1_2            2

UPDATE SQL script to create the tables with example data to match the above:

create table DIRS (dir_id number(38) primary key
    , parent_dir_id number(38) null references DIRS(dir_id)
    , dir_name varchar2(128) not null);

create table FILES (file_id number(38) primary key
    , file_name varchar2(128) not null
    , dir_id number(38) not null references DIRS(dir_id)
    , file_size number not null
    , unique (dir_id, file_name));

insert into DIRS 
select 1, null, 'ROOT' from dual
union all select 2, 1, 'DIR1_1' from dual 
union all select 3, 1, 'DIR1_2' from dual 
union all select 4, 2, 'DIR2_1' from dual 
union all select 5, 2, 'DIR2_2' from dual;

insert into files
select 1, 'test1.txt', 5, 100 from dual
union all select 2, 'test2.txt', 5, 200 from dual
union all select 3, 'test5.txt', 5, 50 from dual
union all select 4, 'test3.txt', 3, 300 from dual
union all select 5, 'test4.txt', 3, 300 from dual
union all select 6, 'test6.txt', 4, 100 from dual;

commit;

602

asked Sep 10 '12 22:09

GregH

2 Answers

This one is pretty straightforward:

09:38:54 HR@vm_xe> l                                      
  1  select sys_connect_by_path(dp.dir_name, '/') path    
  2         ,(select count(file_id)                       
  3             from dirs dc                              
  4                  ,files f                             
  5            where f.dir_id(+) = dc.dir_id              
  6          connect by prior dc.dir_id = dc.parent_dir_id
  7            start with dc.dir_id = dp.dir_id           
  8          ) count                                      
  9    from dirs dp                                       
 10    connect by prior dp.dir_id = dp.parent_dir_id      
 11*   start with dp.parent_dir_id is null                
09:38:55 HR@vm_xe> /                                      

PATH                                COUNT                 
------------------------------ ----------                 
/ROOT                                   6                 
/ROOT/DIR1_1                            4                 
/ROOT/DIR1_1/DIR2_1                     1                 
/ROOT/DIR1_1/DIR2_2                     3                 
/ROOT/DIR1_2                            2                 

5 rows selected.                                          

Elapsed: 00:00:00.02

136

answered Sep 28 '22 17:09

Kirill Leontev

select sys_connect_by_path(D.dir_name, '/'), S.count_distinct_file_id
from DIRS D
inner join (select subtree_root_dir_id
            , count(distinct file_id) count_distinct_file_id
        from (select distinct connect_by_root D.DIR_ID subtree_root_dir_id
                    , F.file_id 
                from DIRS D
                left outer join FILES F on F.dir_id = D.dir_id
                start with 1=1 connect by prior D.dir_id = D.parent_dir_id)
        group by subtree_root_dir_id) S
    on D.dir_id = S.subtree_root_dir_id
start with D.dir_id = 1 connect by prior D.dir_id = D.parent_dir_id

Gives the results you asked for, but my gut says I am not seeing something, and that the query can be much simpler. (Please do not accept this answer until a few days have passed, in hopes that some one submits a better answer.)

answered Sep 28 '22 16:09

Shannon Severance

Related questions
                            
                                Datatype/structure to store timezone offset in MySQL
                            
                                Is there anything like Parallel CURSOR?
                            
                                SQL query to search by multiple tags with relevance sorting
                            
                                Best practices for ordering columns in SQL when creating the table
                            
                                Is the connectionString within the project safe when deploying to the cloud?
                            
                                Recommended approach to insert many rows with Castle ActiveRecord and ignore any dupes
                            
                                How to identify high-load SQL in Oracle, using oracle views?
                            
                                Postgres - Is this the right way to create a partial index on a boolean column?
                            
                                Simple subquery in Access
                            
                                Recurring Events, SQL Query
                            
                                DBMS_XPLAN.DISPLAY_CURSOR vs Explain Plan if not using gather_plan_statistics hint
                            
                                Finding Duplicate Orders (by time proximity)
                            
                                Oracle SQL using Like for a wildcard
                            
                                Many-column UPDATE-JOIN with many-ISNULL takes a long time?
                            
                                Using a SQL DataReader to capture inserted or deleted values
                            
                                How to get elasticsearch to perform similar to SQL 'LIKE'
                            
                                Advanced SQL in Rails
                            
                                LINQ to SQL - updating records
                            
                                When does sql exclusively lock a row in an update statement?
                            
                                Optimize SQL Query on SQLite3 by using indexes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With