I fell across a spreadsheet that explains a method to sort both rows and columns of a matrix that contains binary data so that the number of changes between consecutive rows and cols is minimzed.
For example, starting with:

After 15 manual steps described in the tabs of the spreadsheed, the following table is obtained:

I would like to know:
Thanks !
You can represent each row by a vector L = [1, 1, 0, ... 1], and then define the distance between two lines d(L0, L1) by the number of elements at corresponding positions which are different between L0 and L1. This is known as the binary Hamming distance. If you had non-binary data, you would just extend your definition of distance and yes, Levenshtein distance would be an option.
Once you have distance well-defined, the rest of your problem is minimizing distance between consecutive rows. This is exactly the Traveling salesman problem, which is known to be NP-hard(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf).
The direct solution (visiting all permutations) is O(n!), but you can do better easily by using dynamic programming, for example Held–Karp_algorithm. There are also approximate algorithms, such as the Nearest_neighbour_algorithm which quickly computes a non-optimal solution.
Finally, for implementations you can easily google "traveling salesman excel/python" and find many tutorials and examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With