How to sort two arrays for smoothness

Tags:

Say I have two arrays of the same size A and B. For definiteness lets assume they are two dimensional. The values in the two arrays represent some data which should smoothly depend on the position in the array. However some of the values in A have been swapped with their corresponding value in B. I would like to reverse this swapping, but am struggling to find a criterion to tell me when two values should be swapped.

An example should help. Here is some python code that creates two such arrays and randomly swaps some of their elements

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

### create meshgrid ###
x = np.linspace(-10,10,15);
y = np.linspace(-10,10,11);

[X,Y] = np.meshgrid(x,y);

### two sufficiently smooth functions on the meshgrid ###
A = -X**2-Y**2;
B = X**2-Y**2-100;

### plot ###
ax=plt.subplot(2,2,1)
im1=ax.imshow(A,extent=[-10, 10, -10, 10])
ax.set_title('A')
ax2=plt.subplot(2,2,2)
im2=ax2.imshow(B,extent=[-10, 10, -10, 10])
ax2.set_title('B')

### randomly exchange a few of the elements of A and B ###
for i in np.arange(0,15):
    for j in np.arange(0,11):
        randNumb = random.random();
        if randNumb>0.8:
            mem=A[j,i];
            A[j,i] = B[j,i];
            B[j,i] = mem;

### plot for comparison ###
ax=plt.subplot(2,2,3)
im1=ax.imshow(A,extent=[-10, 10, -10, 10])
ax2=plt.subplot(2,2,4)
im2=ax2.imshow(B,extent=[-10, 10, -10, 10])

plt.show()

This results in the following plot:

enter image description here

The upper two plots are the original arrays A and B, the lower two are the shuffled versions. The exercise is now to reverse this process, i.e. to create the original arrays A and B from the shuffled versions.

A note about what I mean by 'smooth'. Of course the algorithm will only work if the original data is actually sufficiently smooth, meaning that neighbouring data points in one array are sufficiently close in value for all point. A solution may assume that this is the case.

Also note that this exercise is super easy to do by eye in the example above. The problem I'm having in implementing this is finding a good criterion to tell me when to swap to cells.

Note that the reverse transformation is allowed to relabel A and B of course.

929

asked Oct 23 '17 09:10

Wolpertinger

2 Answers

One robust method is to compare the mean of the 4 neighbor pixel values to the value actually present in each pixel. That is, for each pixel, compute the mean of the 4 neighbor pixels in both A and B, and compare these to the actual value of this pixel in both A and B. The following condition works nicely, and is really a sort of least-squares method:

if (  (A[i, j] - A_mean)**2 + (B[i, j] - B_mean)**2
    > (A[i, j] - B_mean)**2 + (B[i, j] - A_mean)**2
    ):
    # Do swap

Here, A_mean and B_mean are the mean of the 4 neighbor pixels.

Another important thing to consider is the fact that one sweep through all pixels is not necessarily enough: It might happen that after one sweep, the correction swaps has made it possible for the above condition to recognize more pixels which ought to be swapped. Thus, we have to sweep over the arrays until a "steady state" has been found.

Complete code

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

### create meshgrid ###
x = np.linspace(-10,10,15);
y = np.linspace(-10,10,11);

[X,Y] = np.meshgrid(x,y);

### two sufficiently smooth functions on the meshgrid ###
A = -X**2-Y**2;
B = X**2-Y**2-100;

### plot ###
ax=plt.subplot(3,2,1)
im1=ax.imshow(A,extent=[-10, 10, -10, 10])
ax.set_title('A')
ax2=plt.subplot(3,2,2)
im2=ax2.imshow(B,extent=[-10, 10, -10, 10])
ax2.set_title('B')

### randomly exchange a few of the elements of A and B ###
for i in np.arange(0,15):
    for j in np.arange(0,11):
        randNumb = random.random();
        if randNumb>0.8:
            mem=A[j,i];
            A[j,i] = B[j,i];
            B[j,i] = mem;

### plot for comparison ###
ax=plt.subplot(3,2,3)
im1=ax.imshow(A,extent=[-10, 10, -10, 10])
ax2=plt.subplot(3,2,4)
im2=ax2.imshow(B,extent=[-10, 10, -10, 10])

### swap back ###
N, M = A.shape
swapped = True
while swapped:
    swapped = False
    for i in range(N):
        for j in range(M):
            A_mean = np.mean([A[i - 1    , j - 1    ],
                              A[i - 1    , (j + 1)%M],
                              A[(i + 1)%N, j - 1    ],
                              A[(i + 1)%N, (j + 1)%M],
                              ])
            B_mean = np.mean([B[i - 1    , j - 1    ],
                              B[i - 1    , (j + 1)%M],
                              B[(i + 1)%N, j - 1    ],
                              B[(i + 1)%N, (j + 1)%M],
                              ])
            if (  (A[i, j] - A_mean)**2 + (B[i, j] - B_mean)**2
                > (A[i, j] - B_mean)**2 + (B[i, j] - A_mean)**2
                ):
                # Do swap
                A[i, j], B[i, j] = B[i, j], A[i, j]
                swapped = True

### plot result ###
ax=plt.subplot(3,2,5)
im1=ax.imshow(A,extent=[-10, 10, -10, 10])
ax2=plt.subplot(3,2,6)
im2=ax2.imshow(B,extent=[-10, 10, -10, 10])

plt.show()

Note that the above code consider the arrays to be periodic, in the sense that the neighbor pixels of pixels at the boundary are chosen as those on the opposite boundary (which is the case for the arrays you provided in the example). This index-wrapping happens automatically when the index becomes negative, but not when the index becomes larger than or equal to the given dimension of the array, which is why the modulo operator % is used.

As a bonus trick, notice how I swap A[i, j] and B[i, j] without the need of the temporary mem variable. Also, my outer loop is over the first dimension (the one with length 11), while my inner loop is over the second dimension (the one with length 15). This is faster than doing the reverse loop order, since now each element is visited in contiguous order (the order in which they actually exist in memory).

Finally, note that this method is not guaranteed to always work. It may happen that so many nearby pixels are swapped that the "correct" solution cannot be found. This will however be the case regardless of what criterion you choose to determine whether two pixels should be swapped or not.

Edit (non-periodic arrays)

For non-periodic arrays, the boundary pixels will have fewer than 4 neighbors (3 for edge pixels, 2 for corner pixels). Something along these lines:

A_neighbors = []
B_neighbors = []
if i > 0 and j > 0:
    A_neighbors.append(A[i - 1, j - 1])
    B_neighbors.append(B[i - 1, j - 1])
if i > 0 and j < M - 1:
    A_neighbors.append(A[i - 1, j + 1])
    B_neighbors.append(B[i - 1, j + 1])
if i < N - 1 and j > 0:
    A_neighbors.append(A[i + 1, j - 1])
    B_neighbors.append(B[i + 1, j - 1])
if i < N - 1 and j < M - 1:
    A_neighbors.append(A[i + 1, j + 1])
    B_neighbors.append(B[i + 1, j + 1])
A_mean = np.mean(A_neighbors)
B_mean = np.mean(B_neighbors)

Note that with fewer neighbors, the method becomes less robust. You could also experiment with using the nearest 8 pixels (that is, include diagonal neighbors) rather than just 4.

159

answered Sep 28 '22 17:09

jmd_dk

Note: You might want to restate your question on MathOverflow.

First, iterate over all horizontal and vertical neighbours. (It's up to you to also consider diagonals or not.)

Then, calculate the difference of all "neighbour" values.

Finally, there are two popular choices:

A) Sum over the abs() (absolute value) of all differences
B) Sum over the square of all differences

Your optimization goal is to minize that sum.

Variant A) is usually more intuitive, while variant B) is usually easier to track with optimization tools.

(The problem with A) is that the function is smooth, but not differentiable, while B) is smooth and differentiable, i.e. B) "behaves" better under mathematical analysis.)

answered Sep 28 '22 17:09

vog

Related questions
                            
                                AttributeError:'list' object has no attribute 'size'
                            
                                Stopping a python process so that context managers still call __exit__ (in windows)
                            
                                Split equivalent of gzip files in python
                            
                                Multicursor selection in Jupyter without mouse
                            
                                Python Errno 9 Bad file descriptor in Mac OS X
                            
                                Handling worker death in multiprocessing Pool
                            
                                Can I use Environment Markers in tests_require in setup.py?
                            
                                pip install specific fork
                            
                                Do I need to use one_hot encoding if my output variable is binary?
                            
                                Plotting 3D image form a data in NumPy-array
                            
                                Matplotlib Make Center Circle Transparent
                            
                                Multi-threading in Python: Getting stuck at last thread
                            
                                Why subprocess.Popen returncode differs for similar commands with bash
                            
                                Use python code for build android app in android studio [closed]
                            
                                How can I fix a broken pip under anaconda
                            
                                Tensorboard scalar plotting with epoch number on the horizontal axis
                            
                                tkinter.TclError: character U+1f449 is above the range (U+0000-U+FFFF) allowed by Tcl
                            
                                Matplotlib 3D Waterfall Plot with Colored Heights
                            
                                OpenAI gym player mode
                            
                                ValueError: exog does not have full column rank

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to sort two arrays for smoothness

Tags:

python

arrays

algorithm

sorting