Efficient algorithm for detecting different elements in a collection

Tags:

Imagine you have a set of five elements (A-E) with some numeric values of a measured property (several observations for each element, for example "heart rate"):

A = {100, 110, 120, 130}
B = {110, 100, 110, 120, 90}
C = { 90, 110, 120, 100}
D = {120, 100, 120, 110, 110, 120}
E = {110, 120, 120, 110, 120}

First, I have to detect if there are significant differences on the average levels. So I run a one way ANOVA using the Statistical package provided by Apache Commons Math. No problems so far, I obtain a boolean that tells me whether differences are found or not.

Second, if differences are found, I need to know the element (or elements) that is different from the rest. I plan to use unpaired t-tests, comparing each pair of elements (A with B, A with C .... D with E), to know if an element is different than the other. So, at this point I have the information of the list of elements that present significant differences with others, for example:

C is different than B
C is different than D

But I need a generic algorithm to efficiently determine, with that information, what element is different than the others (C in the example, but could be more than one).

Leaving statistical issues aside, the question could be (in general terms): "Given the information about equality/inequality of each one of the pairs of elements in a collection, how could you determine the element/s that is/are different from the others?"

Seems to be a problem where graph theory could be applied. I am using Java language for the implementation, if that is useful.

Edit: Elements are people and measured values are times needed to complete a task. I need to detect who is taking too much or too few time to complete the task in some kind of fraud detection system.

477

asked Feb 24 '10 13:02

Guido

1 Answers

Just in case anyone is interested in the final code, using Apache Commons Math to make statistical operations, and Trove to work with collections of primitive types.

It looks for the element(s) with the highest degree (the idea is based on comments made by @Pace and @Aniko, thanks).

I think the final algorithm is O(n^2), suggestions are welcome. It should work for any problem involving one cualitative vs one cuantitative variable, assuming normality of the observations.

import gnu.trove.iterator.TIntIntIterator;
import gnu.trove.map.TIntIntMap;
import gnu.trove.map.hash.TIntIntHashMap;
import gnu.trove.procedure.TIntIntProcedure;
import gnu.trove.set.TIntSet;
import gnu.trove.set.hash.TIntHashSet;

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.math.MathException;
import org.apache.commons.math.stat.inference.OneWayAnova;
import org.apache.commons.math.stat.inference.OneWayAnovaImpl;
import org.apache.commons.math.stat.inference.TestUtils;


public class TestMath {
    private static final double SIGNIFICANCE_LEVEL = 0.001; // 99.9%

    public static void main(String[] args) throws MathException {
        double[][] observations = {
           {150.0, 200.0, 180.0, 230.0, 220.0, 250.0, 230.0, 300.0, 190.0 },
           {200.0, 240.0, 220.0, 250.0, 210.0, 190.0, 240.0, 250.0, 190.0 },
           {100.0, 130.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 },
           {200.0, 230.0, 150.0, 230.0, 240.0, 200.0, 210.0, 220.0, 210.0 },
           {200.0, 230.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 }
        };

        final List<double[]> classes = new ArrayList<double[]>();
        for (int i=0; i<observations.length; i++) {
            classes.add(observations[i]);
        }

        OneWayAnova anova = new OneWayAnovaImpl();
//      double fStatistic = anova.anovaFValue(classes); // F-value
//      double pValue = anova.anovaPValue(classes);     // P-value

        boolean rejectNullHypothesis = anova.anovaTest(classes, SIGNIFICANCE_LEVEL);
        System.out.println("reject null hipothesis " + (100 - SIGNIFICANCE_LEVEL * 100) + "% = " + rejectNullHypothesis);

        // differences are found, so make t-tests
        if (rejectNullHypothesis) {
            TIntSet aux = new TIntHashSet();
            TIntIntMap fraud = new TIntIntHashMap();

            // i vs j unpaired t-tests - O(n^2)
            for (int i=0; i<observations.length; i++) {
                for (int j=i+1; j<observations.length; j++) {
                    boolean different = TestUtils.tTest(observations[i], observations[j], SIGNIFICANCE_LEVEL);
                    if (different) {
                        if (!aux.add(i)) {
                            if (fraud.increment(i) == false) {
                                fraud.put(i, 1);
                            }
                        }
                        if (!aux.add(j)) {
                            if (fraud.increment(j) == false) {
                                fraud.put(j, 1);
                            }
                        }
                    }           
                }
            }

            // TIntIntMap is sorted by value
            final int max = fraud.get(0);
            // Keep only those with a highest degree
            fraud.retainEntries(new TIntIntProcedure() {
                @Override
                public boolean execute(int a, int b) {
                    return b != max;
                }
            });

            // If more than half of the elements are different
            // then they are not really different (?)
            if (fraud.size() > observations.length / 2) {
                fraud.clear();
            }

            // output
            TIntIntIterator it = fraud.iterator();
            while (it.hasNext()) {
                it.advance();
                System.out.println("Element " + it.key() + " has significant differences");             
            }
        }
    }
}

125

answered Sep 30 '22 03:09

Guido

Related questions
                            
                                How to setup pre-authentication header-based authentication in Spring Boot?
                            
                                Ambiguous overload in Java8 - is ECJ or javac right?
                            
                                JavaFX. Set different icons for the title bar and the operating system task bar
                            
                                Is the @Query annotation in spring SQL Injection safe?
                            
                                Changing/Switching the Windows 7 input language using java
                            
                                Intellij working with hybrid projects (maven+gradle)
                            
                                Linkedin Android SDK - Unable to connect to API (INVALID_REQUEST)
                            
                                Monitoring the size of the Netty event loop queues
                            
                                JGit sets git: URI instead of https: for remote on CircleCI
                            
                                Java Console Logs not getting written to disk with an Applet
                            
                                How to implement "un-dwell" in android geofences?
                            
                                Passing nested Class<MyInterface<T>> as a parameter in Android
                            
                                Go to Page in epub reader (PageTurner)
                            
                                Building a standalone executeable JAR with OpenEJB
                            
                                Bottleneck when using auth/admin/realms/myrealm/users in my app
                            
                                Get checksum of the source codes in Android library
                            
                                PowerMockito (with Mockito) failing with ExceptionInInitializerError
                            
                                Java SWT Browser. Different output on screens with Ultra HD (4K) or higher resolutions
                            
                                How to make Spring's @Autowired to work in JUnit 5 extensions? [duplicate]
                            
                                JSTL taglib URI is obsolete?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient algorithm for detecting different elements in a collection

Tags:

java

algorithm

collections

statistics

anova

Guido

People also ask

1 Answers

Guido

Recent Activity

Donate For Us