Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL over clause - dividing partition into numbered sub-partitions

I have a challenge, that I've come across at multiple occasions but never been able to find an efficient solution to. Imagine I have a large table with data regarding e.g. bank accounts and their possible revolving moves from debit to credit:

AccountId DebitCredit AsOfDate
--------- ----------- ----------
aaa       d           2018-11-01
aaa       d           2018-11-02
aaa       c           2018-11-03
aaa       c           2018-11-04
aaa       c           2018-11-05
bbb       d           2018-11-02
ccc       c           2018-11-01
ccc       d           2018-11-02
ccc       d           2018-11-03
ccc       c           2018-11-04
ccc       d           2018-11-05
ccc       c           2018-11-06

In the example above I would like to assign sub-partition numbers to the combination of AccountId and DebitCredit where the partition number is incremented each time DebitCredit shifts. In other words in the example above I would like this result:

AccountId DebitCredit AsOfDate   PartNo
--------- ----------- ---------- ------
aaa       d           2018-11-01      1
aaa       d           2018-11-02      1
aaa       c           2018-11-03      2
aaa       c           2018-11-04      2
aaa       c           2018-11-05      2

bbb       d           2018-11-02      1

ccc       c           2018-11-01      1
ccc       d           2018-11-02      2
ccc       d           2018-11-03      2
ccc       c           2018-11-04      3
ccc       d           2018-11-05      4
ccc       c           2018-11-06      5

I cannot really figure out how to do it quickly and efficiently. The operation has to be done daily on a tables with millions of rows.

In this example it is guaranteed that we will have consecutive rows for all accounts. However, of course the customer might open an account the 15th in the month and/or close his account the 26th.

The challenge is to be solved on an MSSQL 2016 server, but a solution that would work on 2012 (and maybe even 2008r2) would be nice.

As you can imagine there's no way of telling whether there will only be debit or credit rows or whether the account will be revolving each day.

like image 416
Stanley Gade Avatar asked Nov 12 '18 07:11

Stanley Gade


People also ask

How do I split a partition in SQL?

If you have a partitioned table or index in SQL Server, but you need more partitions, you can add a partition to the partition function using the ALTER PARTITION FUNCTION statement with the SPLIT RANGE argument. When you do this, you split an existing partition into two.

How over () function works in SQL?

Determines the partitioning and ordering of a rowset before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window.

What is Row_number over partition by SQL Server?

ROW_NUMBER function is a SQL ranking function that assigns a sequential rank number to each new record in a partition. When the SQL Server ROW NUMBER function detects two identical values in the same partition, it assigns different rank numbers to both.

What is horizontal partitioning?

Horizontal partitioning (often called sharding). In this strategy, each partition is a separate data store, but all partitions have the same schema. Each partition is known as a shard and holds a specific subset of the data, such as all the orders for a specific set of customers.


1 Answers

If you have sql server 2012+, you can use lag() and a window summation to get this:

select *,sum(PartNoAdd) over (partition by AccountId order by AsOfDate asc) as PartNo_calc
from
(
    select *,
    case when DebitCredit=lag(DebitCredit,1) over (partition by AccountId order by AsOfDate asc) then 0 else 1 end as PartNoAdd
    from t 
)t2
order by AccountId asc, AsOfDate  asc

At the inner query, PartNoAdd checks if the previous DebitCard for this account is the same. If it is, it returns 0 (we should add nothing), else it returns 1.

Then the outer query sums all the PartNoAdd for this Account.

like image 154
George Menoutis Avatar answered Oct 13 '22 11:10

George Menoutis