Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Count is Slow in Google Sheets

Here's my way of calculating running count by groups in Sheets:

=LAMBDA(a,INDEX(if(a="",,COUNTIFS(a,a,row(a),"<="&row(a)))))(B4:B)

The complexity of this formula is R^2 = 1000000 operations for 1K rows. I'd love to make more efficient formula, and tried combinations of LABMDA and SCAN. For now I've found only the way to do it fast with 1 group at a time:

=INDEX(IF(B4:B="🌽 Corn",SCAN(0,B4:B,LAMBDA(i,v,if(v="🌽 Corn",i+1,i))),))

Can we do the same for all groups? Do you have an idea?


Note: the script solution would use object and hash to make it fast.

enter image description here


Legal Tests

We have a list of N items total with m groups. Group m(i) is a unique item which may repeat randomly. Samlpe dataset:

a
b
b
b
a

↑ Sample for 5 items total and 2 groups: N=5; m=2. Groups are "a" and "b"

The task is to find the function which will work faster for different numbers of N and m:

  1. Case #1. 1000+ accurances of an item from a group m(i)
  2. Case #2. 1000+ different groups m
  3. General case sagnificant number of total items N ~ 50K+

Playground

Samlpe Google Sheet with 50K rows of data. Please click on the button 'Use Tamplate':

Test Sheet with 50K values

Speed Results

Tested solutions:

  1. Countifs from the question and Countif and from answer.
  2. Xlookup from answer
  3. Complex Match logic from answer
  4. 🏆Sorting logic from the answer

In my enviroment, the sorting option works faster than other provided solutions. Test results are here, tested with the code from here.

like image 779
Max Makhrov Avatar asked Sep 05 '25 03:09

Max Makhrov


1 Answers

Sorting algorithm

The idea is to use SORT in order to reduce the complexity of the calculation. Sorting is the built-in functionality and it works faster than countifs.

  1. Sort columns and their indexes
  2. Find the place where each new element of a group starts
  3. Create a counter of elements for sorted range
  4. Sort the result back using indexes from step 1

enter image description here

Data is in range A2:A

1. Sort + Indexes

=SORT({A2:A,SEQUENCE(ROWS(A2:A))})

2. Group Starts

C2:C is a range with sorted groups

=MAP(SEQUENCE(ROWS(A2:A)),LAMBDA(v,if(v=1,0,if(INDEX(C2:C,v)<>INDEX(C2:C,v-1),1,0))))

3. Counters

Count the item of each group by the column of 0/1 values, 1 - where group starts:

=SCAN(0,F2:F,LAMBDA(ini,v,IF(v=1,1,ini+1)))

4. Sort the resulting countes back

=SORT(H2:H,D2:D,1)

The Final Solution

Suggested by Tom Sharpe:

cut out one stage of the calculation by omitting the map and going straight to a scan like this:

=LAMBDA(a,INDEX(if(a="",, LAMBDA(srt, SORT( SCAN(1,SEQUENCE(ROWS(a)), LAMBDA(ini,v,if(v=1,1,if(INDEX(srt,v,1)<>INDEX(srt,v-1,1),1,ini+1)))), index(srt,,2),1) ) (SORT({a,SEQUENCE(ROWS(a))})))))(A2:A)

↑ In my tests this solution is faster.

I pack it into the named function. Sample file with the solution: https://docs.google.com/spreadsheets/d/1OSnLuCh-duW4eWH3Y6eqrJM8nU1akmjXJsluFFEkw6M/edit#gid=0

this image explains the logic and the speed of sorting:

enter image description here

↑ read more about the speed test

like image 121
Max Makhrov Avatar answered Sep 08 '25 05:09

Max Makhrov