Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to evaluate a condition dependent on statistics within groups of categories?

Tags:

loops

excel

vba

Firstly, I'll show a minimal example of my data and the code I have so far, so it will be easier to explain my issue.

Consider the following data:

ID  Esp         DBH     Cod
55  E_grandis   9.00    
55  E_grandis   9.71    7
55  E_grandis   10.00   
55  E_grandis   1.00    7
55  E_grandis   7.00    7
55  E_grandis           1

I am trying to verify if rows with Cod = 7 have values greater than the:

 average of DBH - 1 * standard deviation of DBH.

In the example above the average of DBH is 7.34 and the standard deviation is 3.73. Therefore, DBHs values should not be greater than 3.61 (7.34 - 3.73) when they are tagged Cod 7.

Cells D3 and D6 do not pass in the criteria because their DBHs (C3 and C6) are greater than 3.61. Among all rows with with Cod 7, only C5 is smaller than 3.61.

I wrote the code below which displays a message box when such criteria is not met:

Sub Cod7()

Dim msg As String 'msg box
Dim ID As Range
Dim dbh_stdev As Double 'standard deviation of dbh
Dim dbh_avg As Double 'average of dbh
Dim not_dominated As Double 'criteria threshold (upper bound)
Dim cell_i As Range 'initial of array to compute average and standard deviation
Dim cell_e As Range 'end of array to compute average and standard deviation

    msg = ""
    Set cell_i = Range("C2")
    Set cell_e = Range("C7")

    dbh_stdev = WorksheetFunction.StDev(Range(cell_i, cell_e)) 'dbh standard deviation
    dbh_avg = WorksheetFunction.Average(Range(cell_i, cell_e)) 'dbh average
    not_dominated = dbh_avg - dbh_stdev 'upper bound

'searches cells with cod 7 on column Cod, and it displays a message box if
'DBH is greater than the 'not_dominated' variable value
For Each ID In Range("A2", Range("A2").End(xlDown))
    If ID.Offset(0, 3) = 7 And _
       ID.Offset(0, 2) <> 0 And _
       ID.Offset(0, 2) > not_dominated Then
             msg = msg & "Cod 7 on " & ID.Offset(0, 3).Address() & " is incorrect" & vbLf
    End If
Next ID

If Len(msg) > 0 Then MsgBox msg

End Sub

Now the problem is that in my real data I have more than one category under the column Esp (specie), and I need to evaluate the criteria, calculating the average and standard deviation of DBHs inside each group of specie.
The groups of species are clustered, i.e., one specie occurs through adjacent rows.

For example, this is a minimal data with two categories under the Esp column: E_grandis and E_citriodora.

ID  Esp           DBH    Cod
55  E_grandis     9.00  
55  E_grandis     9.71   7
55  E_grandis     10.00 
55  E_grandis     1.00   7
55  E_grandis     7.00   7
55  E_grandis            1
55  E_citriodora  3.00  
55  E_citriodora  2.00   7
55  E_citriodora  2.00   7
55  E_citriodora         1      
55  E_citriodora         1
55  E_citriodora  0.50   7

The average of DBH in E_citriodora is 1.88 and the standard deviation is 1.03. Rows with Cod = 7 cannot have DBH greater than 0.85 (1.88-1.03). In this case, cells C9 e C10 don't pass the criteria and cell C13 passes.

How can I adapt the code to apply such criteria within groups of 'Esp'?

like image 542
Andre Silva Avatar asked Dec 19 '22 12:12

Andre Silva


2 Answers

I believe the code below does what you want. Please note that this will only work if all species are "grouped" together.

I've added an external loop that allows the code to iterate through all rows that have data (specifically, that have a value in ID).

The initial value of the starting cell (cell_i) is C2 as in the original code but I changed the way it computes the ending cell (cell_e): it is now based on the number of rows in column B that have the same value as cell_i's current species (that's what the CountIf is doing and that's why this only works if all species are clustered together).

That, together with the Set cell_i = cell_e.Offset(1), makes the loop jump from the 1st row of a species to the next one, etc.

For example, the 1st time this runs against your sample data, cell_i will be C2 and cell_e will be C7 because the number of rows that have E_grandis in column B is 6 which, subtracted of 1 and offset from the cell_i's current row means it'll select a cell 5 rows down from the current one.

The 2nd time it'll start from C8 and go through C12. Etc.

Inside the loop body, I've put your original code (mostly «unharmed»). I've just tweaked the For loop so that it iterates through the cells in the range (cell_i to cell_e as captured in the groupRange variable) instead of iterating between all rows that have a value in column A.

I've added a couple of Select calls so that you can follow the values of ccell and groupRange as you step through the code.

Option Explicit

Public Sub Cod7()

    Dim msg As String 'msg box
    Dim dbh_stdev As Double 'standard deviation of dbh
    Dim dbh_avg As Double 'average of dbh
    Dim not_dominated As Double 'criteria threshold (upper bound)
    Dim cell_i As Range 'initial of array to compute average and standard deviation
    Dim cell_e As Range 'end of array to compute average and standard deviation
    Dim ccell As Range 'current cell
    Dim groupRange As Range

    msg = ""
    Set cell_i = Range("C2")

    Do While cell_i.Offset(, -2) <> ""

        Set cell_e = cell_i.Offset(WorksheetFunction.CountIf(Range("B:B"), cell_i.Offset(, -1).Value) - 1)

        Set groupRange = Range(cell_i, cell_e)
        groupRange.Select

        dbh_stdev = WorksheetFunction.StDev(groupRange) 'dbh standard deviation
        dbh_avg = WorksheetFunction.Average(groupRange) 'dbh average
        not_dominated = dbh_avg - dbh_stdev 'upper bound

        'searches cells with cod 7 on column Cod, and it displays a message box if
        'DBH is greater than the 'not_dominated' variable value
        For Each ccell In groupRange
            ccell.Select
            If ccell.Offset(, 1).Value = 7 And _
                ccell.Value <> 0 And _
                ccell.Value > not_dominated Then
                     msg = msg & "Cod 7 on " & ccell.Offset(, 1).Address() & " is incorrect" & vbLf
            End If
        Next

        Set cell_i = cell_e.Offset(1)

    Loop

    If Len(msg) > 0 Then MsgBox msg

End Sub
like image 85
ssarabando Avatar answered May 25 '23 13:05

ssarabando


or you could do away with the whole script alltogether and use this formula in cell E2 (and then copy, paste down):

{=IF(AND(D2=7,C2>AVERAGEIF($B$2:$B$13,B2,$C$2:$C$13)-1* STDEV(IF($B$2:$B$13=B2,$C$2:$C$13))),"warning","")}

Notice the array formula - remember to confirm with ctrl-shift-enter

like image 29
user3616725 Avatar answered May 25 '23 12:05

user3616725