I have an ordered set of data points stored as a TreeSet<DataPoint>
. Each data point has a position
and a Set
of Event
objects (HashSet<Event>
).
There are 4 possible Event
objects A
, B
, C
, and D
. Every DataPoint
has 2 of these, e.g. A
and C
, except the first and last DataPoint
objects in the set, which have T
of size 1.
My algorithm is to find the probability of a new DataPoint
Q
at position x
having Event
q
in this set.
I do this by calculating a value S
for this data set, then adding Q
to the set and calculating S
again. I then divide the second S
by the first to isolate the probability for the new DataPoint
Q
.
The formula for calculating S
is:
http://mathbin.net/equations/105225_0.png
where
http://mathbin.net/equations/105225_1.png
http://mathbin.net/equations/105225_2.png
for http://mathbin.net/equations/105225_3.png
and
http://mathbin.net/equations/105225_4.png
http://mathbin.net/equations/105225_5.png is an expensive probability function that only depends on its arguments and nothing else (and http://mathbin.net/equations/105225_6.png), http://mathbin.net/equations/105225_7.png is the last DataPoint
in the set (righthand node), http://mathbin.net/equations/105225_8.png is the first DataPoint
(lefthand node), http://mathbin.net/equations/105225_9.png is the rightmost DataPoint
that isn't the node, http://mathbin.net/equations/105225_10.png is a DataPoint
,http://mathbin.net/equations/105225_12.png is the Set
of events for this DataPoint
.
So the probability for Q
with Event
q
is:
http://mathbin.net/equations/105225_11.png
I implemented this algorithm in Java like so:
public class ProbabilityCalculator {
private Double p(DataPoint right, Event rightEvent, DataPoint left, Event leftEvent) {
// do some stuff
}
private Double f(DataPoint right, Event rightEvent, NavigableSet<DataPoint> points) {
DataPoint left = points.lower(right);
Double result = 0.0;
if(left.isLefthandNode()) {
result = 0.25 * p(right, rightEvent, left, null);
} else if(left.isQ()) {
result = p(right, rightEvent, left, left.getQEvent()) * f(left, left.getQEvent(), points);
} else { // if M_k
for(Event leftEvent : left.getEvents())
result += p(right, rightEvent, left, leftEvent) * f(left, leftEvent, points);
}
return result;
}
public Double S(NavigableSet<DataPoint> points) {
return f(points.last(), points.last().getRightNodeEvent(), points)
}
}
So to find the probability of Q
at x
with q
:
Double S1 = S(points);
points.add(Q);
Double S2 = S(points);
Double probability = S2/S1;
As the implementation stands at the moment it follows the mathematical algorithm closely. However this turns out not to be a particularly good idea in practice, as f
calls itself twice for each DataPoint
. So for http://mathbin.net/equations/105225_9.png, f
is called twice, then for the n-1
f
is called twice again for each of the previous calls, and so on and so forth. This leads to a complexity of O(2^n)
which is pretty terrible considering there can be over 1000 DataPoints
in each Set
. Because p()
is independent of everything except its parameters I have included a caching function where if p()
has already been calculated for these parameters it just returns the previous result, but this doesn't solve the inherent complexity problem. Am I missing something here with regards to repeat computations, or is the complexity unavoidable in this algorithm?
Bottom-up. Sometimes the best way to improve the efficiency of a recursive algorithm is to not use recursion at all. In the case of generating Fibonacci numbers, an iterative technique called the bottom-up approach can save us both time and space.
Recursion is the technique of making a function call itself. This technique provides a way to break complicated problems down into simple problems which are easier to solve.
Iteration can be used to repeatedly execute a set of statements without the overhead of function calls and without using stack memory. Iteration is faster and more efficient than recursion.
Method 1: Recursion Tree Method We take the sum of each value of nodes to find the total complexity of the algorithm. Draw a recursion tree based on the given recurrence relation. Determine the number of levels, cost at each level and cost of the last level. Add the cost of all levels and simplify the expression.
You also need to memoize f
on the first 2 arguments (the 3rd is always passed through, so you don't need to worry about that). This will reduce the time complexity of your code from O(2^n) to O(n).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With