Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solving dependency constraints

I have a classic dependency solving problem. I thought I was headed in the right direction, but now I've hit a roadblock and I am not sure how to proceed.

Background

In the known universe (the cache of all artifacts and their dependencies), each there is a 1->n relationship between artifacts and versions, and each version may contain a different set of dependencies. For example:

A
  1.0.0
    B (>= 0.0.0)
  1.0.1
    B (~> 0.1)
B
  0.1.0
  1.0.0

Given a set of "demand constraints", I'd like to find the best possible solution (where "best" is the highest possible version that still satisfies all constraints). Here's an example "demand constraints" with the solution:

solve!('A' => '~> 1.0') #=> {"A" => "1.0.1", "B" => "0.1.0"}

In reality, there are significantly more demands:

solve!('A' => '~> 1.0', 'B' => '>= 0.0.0', 'C' => '...', 'D' => '...')

(The versions follow the semantic versioning standard)

I Tried

The current solution uses back-tracing and is not very performant. I did some digging and found the performance issues resulted due to the size of the universe. I decided to try an alternate approach and construct a "possibility" DAG graph for just the set of demands:

class Graph
  def initialize
    @nodes = {}
    @edges = {}
  end

  def node(object)
    @nodes[object] ||= Set.new
    self
  end

  def edge(a, b)
    node(a)
    node(b)

    @nodes[a].add(b)

    self
  end

  def nodes
    @nodes.keys
  end

  def edges
    @nodes.values
  end

  def adjacencies(node)
    @nodes[node]
  end
end

I then construct a DAG of all the possible solutions from the universe. This drastically reduces the number of possibilities and gives me an actual graph with real artifact possibilities to work with.

def populate(artifact)
  return if loaded?(artifact)

  @graph.node(artifact)

  artifact.dependencies.each do |dependency|
    versions_for(dependency).each do |dependent_artifact|
      @graph.edge(artifact, dependent_artifact)
      populate(dependent_artifact)
    end
  end
end

private

def versions_for(dependency)
  possibles = @universe.versions(dependency.name, dependency.constraint)

  # Short-circuit if there are no versions for this dependency,
  # since we know the graph won't be solvable.
  raise "No solution for #{dependency}!" if possibles.empty?

  possibles
end

So, from the earlier example graph, if I had the demands 'A', '>= 0.0.0', my DAG would look like:

+---------+   +---------+
| A-1.0.0 |   | A-1.0.1 |
+---------+   +---------+
       /  \        |
      /    \       |
     /      \      |
    /        \     |
+---------+   +---------+
| B-1.0.0 |   | B-0.1.0 |
+---------+   +---------+

Since the possible values for A-1.0.0 are "any value of B", but the constraints for A-1.0.1 are "any B in the 0.1 series". This is currently working (with a full test suite) as expected.

In other words, the DAG takes the abstract dependency constraints and creates a "real" graph where each edge is a dependency and each vertex (I've called it a node) is an actual artifact. If a solution exists, it is somewhere in this graph.

And sadly, this is where I get stuck. I'm unable to come up with an algorithm or procedure to find the "best" pathway through this graph. I'm also unsure of a way to detect if the graph isn't solvable.

I did some research, and I thought topological sort (tsort) was the process I needed. However, that algorithm determines the insertion order for dependencies, not the best solution.

I'm fairly certain this is an np-hard problem and will likely have an inefficient runtime. I though using a DAG would reduce the number of comparisons I have to do. Am I wrong in this assumption? Is there a better data structure to use?

Final thoughts

  • I have tagged this question "Ruby" because I'm using Ruby, however I am looking for psuedo-code/direction. This isn't a homework problem - I'm genuinely trying to learn.
  • I have tried to give as much background as possible, but please leave a comment if you would like more detail on a particular topic. This is already a lengthy post, but I do have more code I can share.
like image 345
sethvargo Avatar asked Mar 24 '14 16:03

sethvargo


1 Answers

I'm not expert on the problem, I'm proposing a complete solution that is not optimal, as there are many things that can be optimized ..

The algorithm is simple, it's ideally a recursive set . intersection DFS :

Algorithm

Def

Define: Name as String on format [ .* ]
Define: Version as String on format [ dd.dd.dd ]
Define: Revision as { Name, Version, Requirement }
Define: Range<T> as { min<T>, max<T> }
Define: Condition as { Name, Range<Version> }
Define: Requirement as Set<Revision> OR as Set<Condition>
Define: Component as { Name, Range<Version>, Requirement }
Define: System as Set<Component>

Input

Input: T as System aka basis
Input: C as Set<Condition> aka conditions to apply

Initialization

Init: S as Set<Condition> = { S[i] as Condition | S[i] = {T[i].Name,T[i].Range} }
Init: Q as Stack<Condition> = { push(q) | q[i] = C[i] }

Process

for (Condition c in C)
{
    S.find(c.Name).apply(c)
}

While (Q.size > 0)
{
    Condition q = Q.pop()

    switch (T.assert(S.find(q.Name),q))
    {
      case VALID:
        S.find(q.Name).apply(q)
        q.push(S.find(q.Name).Requirement)

      case INVALID:
        S.find(q.Name).set(INVALID)

      case IDENTICAL:
      case SKIP:
    }
}

return S aka Solution

Operations

Stack.push insert an item at the front of a stack

Stack.pop remove an item from the front of a stack

System.assert(Condition a, Condition b):
    if (a is INVALID) then return SKIP
    else if (b.Range = a.Range) then IDENTICAL
    else if (b.Range - a.Range = {}) then VALID
    else INVALID

Set.find(x) search an item based on condition x

Condition.apply(Condition b) = { this.Name, intersection(this.Range,b.Range) }
like image 54
Khaled.K Avatar answered Oct 19 '22 23:10

Khaled.K