Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient data structure for the average of a sequence

I need to design a data structure that can support efficiently the following operations on a stored (as I see fit) sequence of numbers:

  • Add the integer x to the first i elements of the sequence
  • Append an integer k to the end of the sequence
  • Remove the last element of the sequence
  • Retrieve the average of all the elements in the sequence

Example

Starting with an empty sequence []

  • Append 0 ([0])
  • Append 5 ([0, 5])
  • Append 6 ([0, 5, 6])
  • Add 3 to the first 2 elements in the sequence ([3, 8, 6])
  • Retrieve the average 5.66 ([3, 8, 6])
  • Remove the last element ([3, 8])
  • Retrieve the average 5.5 ([3, 8])

Previous work

I thought about using Fenwick Trees (Topcoder Editorial) but for that I would need to specify the maximum size of the sequence for the initialisation of the Fenwick tree which is don't necessarily know. But if I have a maximum number of elements that the sequence can support I can support those operations on O(lg N) if I also save the sum of all the elements in the sequence.

Edit: The question is for a Codeforces problem and I need sub-linear running time for all the operations and because adding to the first elements can be, in the worst case, be the same as adding to the whole sequence

like image 253
Gustavo Torres Avatar asked Mar 19 '13 21:03

Gustavo Torres


3 Answers

Have you considered using a linked list plus the current length and sum? For each operation you can maintain the current average with constant extra work (you know the length of the list and the sum, and all operations change those two values in a constant way).

The only non-constant operation would be adding a constant to an arbitrary prefix, which would take time proportional to the size of the prefix since you'd need to adjust each number.

To make all operations constant (amortized) constant requires more work. Instead of using a doubly-linked list, back the array with a stack. Each slot i in the array now contains both the number at i and the constant that was to be added to every element up to i. (Note that if you say "add 3 to every element up to element 11," slot 11 would contain the number 3 but slots 0-10 would be empty.) Now every operation is as it was before, except that appending a new element involves the standard array-doubling trick, and when you pop the last element off the end of the queue you need to (a) add in the constant at that slot, and (b) add the constant value from slot i to the constant for slot i-1. So for your example:

Append 0: [(0,0)], sum 0, length 1

Append 5: ([(0,0),(5,0)], sum 5, length 2

Append 6: [(0,0),(5,0),(6,0)], sum 11, length 3

Add 3 to the first 2 elements in the sequence: [(0,0),(5,3),(6,0)], sum 17, length 3

Retrieve the average 5.66

Remove the last element [(0,0),(5,3)], sum 11, length 2

Retrieve the average 5.5

Remove the last element [(0,3)], sum 3, length 1

Here's some Java code that illustrates the idea perhaps more clearly:

class Averager {
  private int sum;
  private ArrayList<Integer> elements = new ArrayList<Integer>();
  private ArrayList<Integer> addedConstants = new ArrayList<Integer>();

  public void addElement(int i) {
    elements.add(i);
    addedConstants.add(0);
    sum += i;
  }

  public void addToPrefix(int k, int upto) {
    addedConstants.set(upto, addedConstants.get(upto) + k);
    sum += k * (upto + 1);
    // Note: assumes prefix exists; in real code handle an error
  }

  public int pop() {
    int lastIndex = addedConstants.length() - 1;

    int constantToAdd = addedConstants.get(lastIndex);
    int valueToReturn = elements.get(lastIndex);
    addedConstants.set(
      lastIndex-1,
      addedConstants.get(lastIndex-1) + constantToAdd);
    sum -= valueToReturn;
    elements.remove(lastIndex);
    addedConstants.remove(lastIndex);
    return valueToReturn + constantToAdd;
    // Again you need to handle errors here as well, particularly where the stack
    // is already empty or has exactly one element
  }

  public double average() {
    return ((double) sum) / elements.length();
  }
}
like image 77
jacobm Avatar answered Sep 22 '22 03:09

jacobm


Sounds like a Doubly Linked List with maintaining a head and tail reference, as well as the current sum and count.

Add the integer x to the first i elements of the sequence

Start at *head, add x, next item. Repeat i times. sum += i*x

Append an integer k to the end of the sequence

Start at *tail, make new item with head = tail, tail = null. Update *tail, sum, and count accordingly.

Remove the last element of the sequence

Update *tail to *tail->prev. Update sum, decrement count

Retrieve the average 5.5 ([3, 8])

Return sum / count

like image 24
Matt Dodge Avatar answered Sep 22 '22 03:09

Matt Dodge


This data structure can just be a tuple (N, S) where N is the count and S is the sum and a stack of numbers. Nothing fancy. All operations are O(1) except for the first which is O(i).

like image 42
Timothy Shields Avatar answered Sep 22 '22 03:09

Timothy Shields