Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for dependency resolution

Tags:

I'm in the process of writing a package manager, and for that I want the dependency resolution to be as powerful as possible.

Each package has a list of versions, and each version contains the following information:

  • A comparable ID
  • Dependencies (a list of packages and for each package a set of acceptable versions)
  • Conflicts (a list of packages and for each package a set of versions that cause issues together with this version)
  • Provides (a list of packages and for each package a set of versions that this package also provides/contains)

For the current state I have a list of packages and their current versions.

I now want to, given the list of available packages and the current state, be able to get a version for each package in a list of packages, taking the given constraints into account (dependencies, conflicting packages, packages provided by other packages) and get back a list of versions for each of these packages. Circular dependencies are possible.

If no valid state can be reached, the versions of the existing packages may be changed, though this should only be done if necessary. Should it not be possible to reach a valid state as much information to the reason should be available (to tell the user "it could work if you remove X" etc.).

If possible it should also be possible to "lock" packages to a specific version, in which case the version of the package may NOT be changed.

What I'm trying to accomplish is very similar to what existing package managers already do, with the difference that not necessarily the latest version of a package needs to be used (an assumption which most package managers seem to do).

The only idea I have so far is building a structure of all possible states, for all possible versions of the packages in question, and then removing invalid states. I really hope this is not the only solution, since it feels very "brute force"-ish. Staying under a few seconds for ~500 available packages with ~100 versions each, and ~150 installed packages would be a good goal (though the faster the better).

I don't believe this is a language-specific question, but to better illustrate it, here is a bit of pseudecode:

struct Version     integer id     list<Package, set<integer>> dependencies     list<Package, set<integer>> conflicts     list<Package, set<integer>> provides  struct Package     string id     list<Version> versions  struct State     map<Package, Version> packages     map<Package, boolean> isVersionLocked  State resolve(State initialState, list<Package> availablePackages, list<Package> newPackages) {     // do stuff here } 

(if you should have actual code or know about an existing implementation of something that does this (in any language, C++ preferred) feel free to mention it anyway)

like image 397
Jan Avatar asked Jan 22 '15 22:01

Jan


People also ask

What is dependency resolving?

Dependency resolution is a process that consists of two phases, which are repeated until the dependency graph is complete: When a new dependency is added to the graph, perform conflict resolution to determine which version should be added to the graph.


1 Answers

It's NP-hard

Some bad news: This problem is NP-hard, so unless P=NP, there is no algorithm that can efficiently solve all instances of it. I'll prove this by showing how to convert, in polynomial time, any given instance of the NP-hard problem 3SAT into a dependency graph structure suitable for input to your problem, and how to turn the output of any dependency resolution algorithm on that problem back into a solution to the original 3SAT problem, again in polynomial time. The logic is basically that if there was some algorithm that could solve your dependency resolution problem in polynomial time, then it would also solve any 3SAT instance in polynomial time -- and since computer scientists have spent decades looking for such an algorithm without finding one, this is believed to be impossible.

I'll assume in the following that at most one version of any package can be installed at any time. (This is equivalent to assuming that there are implicit conflicts between every pair of distinct versions of the same package.)

First, let's formulate a slightly relaxed version of the dependency resolution problem in which we assume that no packages are already installed. All we want is an algorithm that, given a "target" package, either returns a set of package versions to install that (a) includes some version of the target package and (b) satisfies all dependency and conflict properties of every package in the set, or returns "IMPOSSIBLE" if no set of package versions will work. Clearly if this problem is NP-hard, then so is the more general problem in which we also specify a set of already-installed package versions that are not to be changed.

Constructing the instance

Suppose we are given a 3SAT instance containing n clauses and k variables. We will create 2 packages for each variable: one corresponding to the literal x_k, and one corresponding to the literal !x_k. The x_k package will have a conflict with the !x_k package, and vice versa, ensuring that at most one of these two packages will ever be installed by the package manager. All of these "literal" packages will have just a single version, and no dependencies.

For each clause we will also create a single "parent" package, and 7 versions of a "child" package. Each parent package will be dependent on any of the 7 versions of its child package. Child packages correspond to ways of choosing at least one item from a set of 3 items, and will each have 3 dependencies on the corresponding literal packages. For example, a clause (p, !q, r) will have child package versions having dependencies on the literal packages (p, q, !r), (!p, !q, !r), (!p, q, r), (p, !q, !r), (p, q, r), (!p, !q, r), and (p, !q, r): the first 3 versions satisfy exactly one of the literals p, !q or r; the next 3 versions satisfy exactly 2; and the last satisfies all 3.

Finally, we create a "root" package, which has all of the n parent clause packages as its dependencies. This will be the package that we ask the package manager to install.

If we run the package manager on this set of 2k + 8n + 1 package versions, asking it to install the root package, it will either return "IMPOSSIBLE", or a list of package versions to install. In the former case, the 3SAT problem is unsatisfiable. In the latter case, we can extract values for the variables easily: if the literal package for x_k was installed, set x_k to true; if the literal package !x_k was installed, set x_k to false. (Note that there won't be any variables with neither literal package installed: each variable appears in at least one clause, and each clause produces 7 child package versions, at least one of which must be installed, and which will force installation of one of the two literals for that variable.)

Even some restrictions are hard

This construction doesn't make any use of pre-installed packages or "Provides" information, so the problem remains NP-hard even when those aren't permitted. More interestingly, given our assumption that at most one version of any package can be installed at a time, the problem remains NP-hard even if we don't permit conflicts: instead of making the literals x_k and !x_k separate packages with conflict clauses in each direction, we just make them two different versions of the same package!

like image 80
j_random_hacker Avatar answered Oct 21 '22 06:10

j_random_hacker