I fell across a spreadsheet that explains a method to sort both rows and columns of a matrix that contains binary data so that the number of changes between consecutive rows and cols is minimzed.
For example, starting with:
After 15 manual steps described in the tabs of the spreadsheed, the following table is obtained:
I would like to know:
Thanks !
You can represent each row by a vector L = [1, 1, 0, ... 1]
, and then define the distance between two lines d(L0, L1)
by the number of elements at corresponding positions which are different between L0
and L1
. This is known as the binary Hamming distance. If you had non-binary data, you would just extend your definition of distance and yes, Levenshtein distance would be an option.
Once you have distance well-defined, the rest of your problem is minimizing distance between consecutive rows. This is exactly the Traveling salesman problem, which is known to be NP-hard(http://www.diku.dk/hjemmesider/ansatte/jyrki/Paper/EKP85.pdf).
The direct solution (visiting all permutations) is O(n!), but you can do better easily by using dynamic programming, for example Held–Karp_algorithm. There are also approximate algorithms, such as the Nearest_neighbour_algorithm which quickly computes a non-optimal solution.
Finally, for implementations you can easily google "traveling salesman excel/python" and find many tutorials and examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With