Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enforcing O(1) get on List

I have a function that chooses a random element of a List and returns it. The list can and is very long (millions of elements) and this function is called thousands of times a second so efficiency is important.

My current implementation looks like:

MyClass getRandomElement(List<MyClass> myClasses) {
  return myClasses.get(getRandomNumber(myClasses.size()));
}

There are two problems with this solution.

  1. List.get is not guaranteed to run in O(1). LinkedList for example implements it in O(n).
  2. size is not guaranteed to run in O(1) on all List implementations.

The second point is not very cogent because all implementations that I am aware of implement it in O(1). The first point is the problematic one.

Is there any way to guarantee (not compile/run time exception) that the implementation is O(1). I thought of changing the interface to:

MyClass getRandomElement(ArrayList<MyClass> myClasses)

This is too strict. I want users to be able to call this function with an ImmutableList. It is even recommended.

I could assert that the value is an instance of ArrayList or ImmutableList. This will preclude any other O(1) implementations but I can probably live with it. It is however a runtime enforcement and not a compile time enforcement. And I am not sure what the runtime overhead of this check is.

Is this the best practice?

like image 394
Benjy Kessler Avatar asked Dec 14 '22 05:12

Benjy Kessler


2 Answers

From the Javadoc of RandomAccess:

Generic list algorithms are encouraged to check whether the given list is an instanceof this interface before applying an algorithm that would provide poor performance if it were applied to a sequential access list, and to alter their behavior if necessary to guarantee acceptable performance.

Sounds a lot like what you are looking for, provided you can live with a runtime check.


Actually, it looks like you can do this at compile time, using an intersection type:

<T, L extends List<T> & RandomAccess> T getRandomElement(L list) { ... }

getRandomElement(new ArrayList<String>());   // OK.
getRandomElement(new LinkedList<String>());  // Compiler error.

The downside of this approach is that you actually need to know the concrete (ish) type of your list in order to be able to call it. For example, you couldn't call getRandomElement(...) in a method like:

void doSomethingWithRandomElements(List<String> strings) { ... }

which doesn't need the random access property of the list, other than to call getRandomElement(strings).

You would then need to revert to a runtime check, or you'd need to propagate the generic constraints up to that method too, and everything calling that method etc. It could get messy quickly.

The choice between compile-time and runtime enforcement is very dependent upon how you'd want to use it.

like image 57
Andy Turner Avatar answered Dec 17 '22 01:12

Andy Turner


You should consider carefully what benefit a restriction of the input has, or what kind of risk you are trying to avoid and what you are trading off. Restricting the parameter type to allow explicitly declared RandomAccess lists only, implies that you loose the ability to pass in any of the random access lists returned by JRE methods, i.e.

  • the result of subList applied on a random access list
  • a random access list wrapped by either Collections.unmodifiableList or Collections.synchronizedList
  • a list returned by Arrays.asList
  • Collections.singletonList(…). Note that Collections.emptyList() also is a random access list, but that’s the only list example not suitable for your getRandomElement method. Also nCopies is random access but passing it to getRandomElement makes little sense.

All of these lists will implement RandomAccess at runtime, but not declared at compile time. And in Java 7, you can’t even type-cast a list to (List&RandomAccess) when you know that it is actually a random access list.

Note that even if the type of the list is a known actual type implementing RandomAccess, e.g. ArrayList, you are enforcing the developer to maintain that compile-time type throughout the code, from the list creation to the point your method gets invoked, instead of using a more abstract type like List for variables or parameters, letting the inflexibility of that single method spread over to all methods using it.

So you are sacrificing a lot. But what for?

If a developer is using a list which has no random access and invokes your method, the performance is O(n) (even if both size() and get() are O(n), the net complexity of performing both one after another is still O(n)). That shouldn’t come at a surprise. The phenomenon of an operation being O(1) for a random access list but O(n)for others, is seen all the time, including for the intrinsic operation List.get itself.

So if your method performs with an O(n) time complexity due to a developer passing in, e.g. a LinkedList, the problem is the developer’s decision of using a LinkedList in the first place. You shouldn’t try to fix that with your method.

By the way, you could try to mitigate the cost of the non random access case by providing an adapted implementation, e.g.:

public static <T> T getRandomElement(List<? extends T> list) {
    Random r = new Random();
    int size;
    if(list instanceof RandomAccess) {
        size = list.size();
        if(size == 0) throw new NoSuchElementException();
        return list.get(r.nextInt(list.size()));
    }
    size = 0;
    T picked = null;
    for(T element: list) {
       if(r.nextInt(++size) == 0) {
           picked = element;
       }
    }
    if(size == 0) throw new NoSuchElementException();
    return picked;
}

This doesn’t change the O(n) nature of processing a non random access list, as that’s impossible, it isn’t even an improvement for the LinkedList case as that class has an O(1) size() method. But imagine a concurrent list having a non-trivial size() method and a weakly consistent iterator. In that case, this method will eventually return a random element instead of failing. That would even work for non-List collections. It also will reduce the complexity to O(n) if the list happens to have even worse complexity for get or size, assuming a reasonable Iterator.

More complex operations are often implemented in a much easier way, e.g.

public static <T> T getRandomElement(List<? extends T> list) {
    if(!(list instanceof RandomAccess)) {
        list = new ArrayList<>(list);
    }
    Random r = new Random();
    int size = list.size();
    if(size == 0) throw new NoSuchElementException();
    return list.get(r.nextInt(list.size()));
}

That reduces the problem to a single unavoidable O(n) operation (adding an O(n) space complexity, of course). Then it can proceed in an efficient way without the need for specialized code. This is crucial if the net complexity would be worse than O(n) for the non random access list. Have a look at the implementation of Collections.shuffle for a practical example.

like image 20
Holger Avatar answered Dec 17 '22 01:12

Holger