I am currently researching for my next project. This is in a pre-planning phase so this question is just to get an overview on existing technology.
I have a directed acyclic graph (DAG) with multiple inputs and output, think artificial neuronal network for now:
The common way to process such a structure is to process the whole network on every (time-)step. I believe that is the method used by frp libraries such as netwire
.
Now I am in the fortunate position that I have a stream of events that each presents the change in one of the input nodes. The idea is that I probably don't have to step each node in the network if I can statically know that a given change will only affect a part of it.
Example
In the image above 5, 7 and 3 are inputs, 11 and 8 are 'hidden' and 2, 9 and 10 are output nodes. A change at node 5 will only affect node 11 and in effect nodes 2, 9 and 10. I won't need to process nodes 7, 3 and 8.
Process this kind of network with as little latency as possible. The size of the graphs will probably reach up to 100k nodes, with a moderate amount of calculation being done per node.
I hope that someone will step up and advertise library X that just gets the job done.
Otherwise my current plan is to derive a calculation per input node from the graph description. Probably I will use the Par
monad so that I won't have to deal with data dependencies myself and still get to benefit from multicore machines.
Par
plan feasible? How much does this depend on the amount of processing needed in each node?Problems like this are usually encoded against either an Applicative
or Arrow
abstraction. I'll only discuss Applicative
. The Applicative
type class, found in Control.Applicative
, allows values and functions to be provided via pure
and functions to be applied to values with <*>
.
class Functor f => Applicative f where
-- | Lift a value.
pure :: a -> f a
-- | Sequential application.
(<*>) :: f (a -> b) -> f a -> f b
Your example graph could be very simply encoded for an Applicative
(replacing each node with addition) as
example1 :: (Applicative f, Num a) => f a -> f a -> f a -> f (a, a, a)
example1 five seven three =
let
eleven = pure (+) <*> five <*> seven
eight = pure (+) <*> seven <*> three
two = pure id <*> eleven
nine = pure (+) <*> eleven <*> eight
ten = pure (+) <*> eleven <*> three
in
pure (,,) <*> two <*> nine <*> ten
The same encoding can be created programmatically from the representation of a graph by traversing the graph such that each node will be visited after all of its dependencies.
There are three optimizations you may desire that can't be implemented for a network encoded using only Applicative
. The general strategy is to encode the problem in terms of Applicative
and a few additional classes as needed for optimization or extra features. You then provide one or more interpreters that implement the necessary classes. This lets you separate the problem from the implementation, allowing you to write your own interpreter or use an existing library like reactive, reactive-banana, or mvc-updates. I am not going to discuss how to write these interpreters or adapt the representation given here to a specific interpreter. I'm only going to discuss the common representation of the program that's needed in order for the underlying interpreter to be able to exploit these optimizations. All three of the libraries I linked can avoid recomputing values and can accommodate the following optimizations.
In the previous example, the intermediate node eleven
is only defined once, but is used in three different places. An implementation of Applicative
won't be able to see through the referential transparency to see that these three eleven
s are all the same. You can either assume that the implementation uses some IO magic to peek through referential transparency or define the network so that an implementation can see that a computation is being reused.
The following is the class of Applicative
Functor
s where the result of a computation can be divided and reused in multiple computations. This class isn't defined in a library anywhere that I know of.
class Applicative f => Divisible f where
(</>) :: f a -> (f a -> f b) -> f b
infixl 2 </>
Your example can then be very simply encoded for a Divisible
Functor
as
example2 :: (Divisible f, Num a) => f a -> f a -> f a -> f (a, a, a)
example2 five seven three =
pure (+) <*> five <*> seven </> \eleven ->
pure (+) <*> seven <*> three </> \eight ->
pure id <*> eleven </> \two ->
pure (+) <*> eleven <*> eight </> \nine ->
pure (+) <*> eleven <*> three </> \ten ->
pure (,,) <*> two <*> nine <*> ten
A typical neuron computes a weighted sum of its inputs and applies a response function to that sum. For a neuron with a large in degree, summing all of its inputs is time consuming. An easy optimization for updating a sum is to subtract the old value and add the new value. This exploits three properties of addition:
Inverse - a * b * b⁻¹ = a
Subtraction is the inverse of adding a number, this inverse allows us to remove the previously added number from the total
Commutativity - a * b = b * a
Addition and subtraction reach the same result regardless of the order they are performed in. This lets us reach the same result when we subtract the old value and add the new value to the total even if the old value wasn't the most recently added value.
Associativity - a * (b * c) = (a * b) * c
Addition and subtraction can be performed in arbitrary groupings and still reach the same result. This lets us reach the same result when we subtract the old value and add the new value to the total even it the old value was added somewhere in the middle of the additions.
Any structure with these properties as well as closure and an identity is an Abelian group. The following dictionary holds enough information for an underlying library to perform the same optimization
data Abelian a = Abelian {
id :: a,
inv :: a -> a,
op :: a -> a -> a
}
This is the class of structures that can total Abelian groups
class Total f where
total :: Abelian a -> [f a] -> f a
Similar optimization is possible for the construction of maps.
Another typical operation in neural networks is to compare a value to a threshold and determine the response entirely based on whether or not the value passed the threshold. If an update to an input doesn't change which side of the threshold the value falls on, the response doesn't change. If the response doesn't change, there's no reason to recalculate all of the downstream nodes. The ability to either detect that there was no change to the threshold Bool
or the response is equality. The following is the class of structures that can exploit equality. stabilize
provides the Eq
instance to the underlying structure.
class Stabilizes f where
stabilize :: Eq a => f a -> f a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With