Given two file trees A and B, is it possible to determine the shortest sequence of operations or a short sequence of operations that is necessary in order to transform A to B?
An operation can be:
A and B are identical when they will have the same files with the same contents (or same size same CRC) and same name, in the same folder structure.
This question has been puzzling me for some time. For the moment I have the following, basic idea:
However I think that this is really a suboptimal way to do that. What could you give me as advice?
Thank you!
This problem is a special case of the tree edit distance problem, for which finding an optimal solution is (unfortunately) known to be NP-hard. This means that there probably aren't any good, fast, and accurate algorithms for the general case.
That said, the paper I linked does contain several nice discussions of approximation algorithms and algorithms that work in restricted cases of the problem. You may find the discussion interesting, as it illuminates many of the issues that actually arise in solving this problem.
Hope this helps! And thanks for posting an awesome question!
You might want to check out tree-edit distance algorithms. I don't know if this will map neatly to your file system, but it might give you some ideas.
https://github.com/irskep/sleepytree (code and paper)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With