I have a classic dependency solving problem. I thought I was headed in the right direction, but now I've hit a roadblock and I am not sure how to proceed.
In the known universe (the cache of all artifacts and their dependencies), each there is a 1->n relationship between artifacts and versions, and each version may contain a different set of dependencies. For example:
A
1.0.0
B (>= 0.0.0)
1.0.1
B (~> 0.1)
B
0.1.0
1.0.0
Given a set of "demand constraints", I'd like to find the best possible solution (where "best" is the highest possible version that still satisfies all constraints). Here's an example "demand constraints" with the solution:
solve!('A' => '~> 1.0') #=> {"A" => "1.0.1", "B" => "0.1.0"}
In reality, there are significantly more demands:
solve!('A' => '~> 1.0', 'B' => '>= 0.0.0', 'C' => '...', 'D' => '...')
(The versions follow the semantic versioning standard)
The current solution uses back-tracing and is not very performant. I did some digging and found the performance issues resulted due to the size of the universe. I decided to try an alternate approach and construct a "possibility" DAG graph for just the set of demands:
class Graph
def initialize
@nodes = {}
@edges = {}
end
def node(object)
@nodes[object] ||= Set.new
self
end
def edge(a, b)
node(a)
node(b)
@nodes[a].add(b)
self
end
def nodes
@nodes.keys
end
def edges
@nodes.values
end
def adjacencies(node)
@nodes[node]
end
end
I then construct a DAG of all the possible solutions from the universe. This drastically reduces the number of possibilities and gives me an actual graph with real artifact possibilities to work with.
def populate(artifact)
return if loaded?(artifact)
@graph.node(artifact)
artifact.dependencies.each do |dependency|
versions_for(dependency).each do |dependent_artifact|
@graph.edge(artifact, dependent_artifact)
populate(dependent_artifact)
end
end
end
private
def versions_for(dependency)
possibles = @universe.versions(dependency.name, dependency.constraint)
# Short-circuit if there are no versions for this dependency,
# since we know the graph won't be solvable.
raise "No solution for #{dependency}!" if possibles.empty?
possibles
end
So, from the earlier example graph, if I had the demands 'A', '>= 0.0.0'
, my DAG would look like:
+---------+ +---------+
| A-1.0.0 | | A-1.0.1 |
+---------+ +---------+
/ \ |
/ \ |
/ \ |
/ \ |
+---------+ +---------+
| B-1.0.0 | | B-0.1.0 |
+---------+ +---------+
Since the possible values for A-1.0.0 are "any value of B", but the constraints for A-1.0.1 are "any B in the 0.1 series". This is currently working (with a full test suite) as expected.
In other words, the DAG takes the abstract dependency constraints and creates a "real" graph where each edge is a dependency and each vertex (I've called it a node
) is an actual artifact. If a solution exists, it is somewhere in this graph.
And sadly, this is where I get stuck. I'm unable to come up with an algorithm or procedure to find the "best" pathway through this graph. I'm also unsure of a way to detect if the graph isn't solvable.
I did some research, and I thought topological sort (tsort) was the process I needed. However, that algorithm determines the insertion order for dependencies, not the best solution.
I'm fairly certain this is an np-hard problem and will likely have an inefficient runtime. I though using a DAG would reduce the number of comparisons I have to do. Am I wrong in this assumption? Is there a better data structure to use?
I'm not expert on the problem, I'm proposing a complete solution that is not optimal, as there are many things that can be optimized ..
The algorithm is simple, it's ideally a recursive set . intersection DFS :
Algorithm
Def
Define: Name as String on format [ .* ]
Define: Version as String on format [ dd.dd.dd ]
Define: Revision as { Name, Version, Requirement }
Define: Range<T> as { min<T>, max<T> }
Define: Condition as { Name, Range<Version> }
Define: Requirement as Set<Revision> OR as Set<Condition>
Define: Component as { Name, Range<Version>, Requirement }
Define: System as Set<Component>
Input
Input: T as System aka basis
Input: C as Set<Condition> aka conditions to apply
Initialization
Init: S as Set<Condition> = { S[i] as Condition | S[i] = {T[i].Name,T[i].Range} }
Init: Q as Stack<Condition> = { push(q) | q[i] = C[i] }
Process
for (Condition c in C)
{
S.find(c.Name).apply(c)
}
While (Q.size > 0)
{
Condition q = Q.pop()
switch (T.assert(S.find(q.Name),q))
{
case VALID:
S.find(q.Name).apply(q)
q.push(S.find(q.Name).Requirement)
case INVALID:
S.find(q.Name).set(INVALID)
case IDENTICAL:
case SKIP:
}
}
return S aka Solution
Operations
Stack.push
insert an item at the front of a stack
Stack.pop
remove an item from the front of a stack
System.assert(Condition a, Condition b):
if (a is INVALID) then return SKIP
else if (b.Range = a.Range) then IDENTICAL
else if (b.Range - a.Range = {}) then VALID
else INVALID
Set.find(x)
search an item based on condition x
Condition.apply(Condition b) = { this.Name, intersection(this.Range,b.Range) }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With