It's possible to conceive of a modification to bubble sort where the "swap" occurs randomly with probability p
, rather than by performing a comparison. The result could be called a "bubble shuffle". Elements near the front would be likely to remain there, but have a chance to be shifted to the back of the list.
Modifying a bubble sort stolen from the internet you could come up with the following:
import random
def bubble_shuffle(arr, p):
arr = copy.copy(arr)
n = len(arr)
# Traverse through all array elements
for i in range(n-1):
# range(n) also work but outer loop will repeat one time more than needed.
# Last i elements are already in place
for j in range(0, n-i-1):
# traverse the array from 0 to n-i-1
# Swap if random number [0, 1] is less than p
if random.random() < p:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
This algorithm is order n-squared... but the probability of an element ending up in any particular spot should be computable ahead of time, so it doesn't "need" to be n-squared. Is there a more efficient approach which could be taken?
I'd considered sampling from a geometric distribution and add this to the original index (plus (len(n)-i-1)/len(n)
to break ties), but this doesn't give the same dynamic.
I agree with btilly and others that the correlations make this very difficult if not impossible to do exactly.
Regarding your approach, it's true that the motion for a single pass is sort of geometrically distributed. For many passes, however, the Central Limit Theorem starts to kick in. Ignoring boundary effects, in a single pass, an element moves left once with probability p
, and otherwise (with probability (1-p)
) moves right a geometric number of times with success probability 1-p
. The mean of this distribution is zero. The first possibility contributes p (-1)^2 = p
to the variance. The second contributes (1-p) sum_{i>=0} p^i (1-p) i^2
, which Wolfram Alpha evaluates as (1+p) p / (1-p)
.
Letting this variance be v = p + (1+p) p / (1-p)
, we can imagine that the delta position of an element after t
passes is distributed normally with mean zero and standard deviation sqrt(t v)
. Our next approximations are to switch from discrete to continuous time and, for each element, to pull a normal sample x
and assume that the delta position changes smoothly as sqrt(t v) x
. For an element originally in position i
, we can solve the equation i + sqrt(t v) x = n - t
for t
to approximate when that element was locked. Then we just sort by those t
descending.
Here's a short Python program implementing this. Hopefully it gets close enough.
import math
import random
def variance(p):
return p + (1 + p) * p / (1 - p)
def solve_quadratic(b, c):
assert c < 0
return (math.sqrt(b ** 2 - 4 * c) - b) / 2
def bubble_shuffle(arr, p):
n = len(arr)
s = math.sqrt(variance(p))
return [
arr[i]
for i in sorted(
range(n),
key=lambda i: solve_quadratic(random.gauss(0, s), i - n),
reverse=True,
)
]
if __name__ == "__main__":
print(bubble_shuffle(range(100), 0.5))
With precomputation that only needs to be done once per (n, p), we can simulate bubble_shuffle runs in expected linear time (excluding precomputation).
Methods: get_bub(n, p): O(n^2) method to run bubble_shuffle
get_expected_bub(n, p): O(n^2) method to calculate the expected average value in each position of running bubble_shuffle
get_dist(pos, p): O(1) method used by simbub that gets a random number of consecutive swaps based on the p being used.
get_simbub(n, p_arr): Expected O(n * min(n, (1/(1-p))) method to simulate running bubble_shuffle. For p = 0.5, this is expected O(n). For p = 1 - (1/n) this is O(n^2).
get_expected_simbub(n, p_arr): O(n^2) method to calculate the expected average value in each position of running simbub.
get_p_arr(n, p, tolerance): Method for seeking the p_arr that aligns simbub with bubble_shuffle (within tolerance) for a given n & p.
compare(n, p, p_arr, trials): Method to run simbub multiple times and compare the results to the expectation for bubble_shuffle.
time_trials(n, p, seconds): For a given n & p, run both bubble_shuffle and simbub for the input seconds and compare how many runs we're able to complete.
All code is in Ruby.
# Run bubble_shuffle
def get_bub(n, p)
arr = [*0..(n-1)]
0.upto(n-1) do |i|
0.upto(n-i-2) do |j|
if rand < p
arr[j], arr[j+1] = arr[j+1], arr[j]
end
end
end
return arr
end
# Get the expected average results of running bubble_shuffle many times
# This works by iteratively distributing value according to p.
def get_expected_bub(n, p)
arr = [*0.upto(n-1)]
(n-1).downto(0) do |last_index|
working_arr = arr.clone
0.upto(last_index) do |i|
working_arr[i] = 0
end
0.upto(last_index) do |source_index|
min_sink = [0, source_index-1].max
max_sink = last_index
min_sink.upto(max_sink) do |sink_index|
portion = 1.0
if sink_index == source_index - 1
portion *= p
else
portion *= (1-p) if source_index > 0
portion *= (p**(sink_index - source_index)) if sink_index > source_index
portion *= (1-p) if sink_index < last_index
end
working_arr[sink_index] += arr[source_index] * portion
end
end
0.upto(last_index) do |i|
arr[i] = working_arr[i]
end
end
return arr
end
# For simbub, randomly get the distance to the index being swapped into
# the current position
def get_dist(pos, p)
return 0 if pos == 0
return [pos, Math.log(1 - rand, p).floor].min
end
# Run simbub from the last-to-first index
# p_arr is the array of probabilities corresponding to the effective probability
# of swapping used at each position. The last value of this array will always
# equal the p value being simulated. So will the first, though this is not used.
def get_simbub(n, p_arr)
arr = [*0..(n-1)]
(n-1).downto(0) do |pos|
p = p_arr[pos]
dist = get_dist(pos, p)
if dist > 0
val_moving_up = arr[pos - dist]
(pos - dist).upto(pos - 1) do |j|
arr[j] = arr[j+1]
end
arr[pos] = val_moving_up
end
end
return arr
end
# Get the expected average results of running simbub many times
# This works by iteratively distributing value according to p_arr.
def get_expected_simbub(n, p_arr)
arr = [*0.upto(n-1)]
(n-1).downto(1) do |last_index|
working_arr = arr.clone
0.upto(last_index) do |i|
working_arr[i] = 0
end
p = p_arr[last_index]
cum_p_distance = 0
0.upto(last_index) do |distance|
if distance == last_index
p_distance = p ** distance
else
p_distance = (1-p) * (p ** distance)
end
working_arr[last_index] += p_distance * arr[last_index - distance]
if distance >= 1
working_arr[last_index - distance] = arr[last_index - distance] + (1 - cum_p_distance) * (arr[last_index - distance + 1] - arr[last_index - distance])
end
cum_p_distance += p_distance
end
arr = working_arr
end
return arr
end
# Solve for the p_arr that yields the same expected averages for simbub for
# each position (within tolerance) as bub
def get_p_arr(n, p, tolerance = 0.00001)
expected_bub = get_expected_bub(n, p)
p_arr = [p] * n
(n-2).downto(1) do |pos|
min_pos_p = 0.0
max_pos_p = 1.0
while true do
expected_simbub = get_expected_simbub(n, p_arr)
if expected_simbub[pos] > expected_bub[pos] + tolerance
min_pos_p = p_arr[pos]
p_arr[pos] = (p_arr[pos] + max_pos_p) / 2.0
elsif expected_simbub[pos] < expected_bub[pos] - tolerance
max_pos_p = p_arr[pos]
p_arr[pos] = (p_arr[pos] + min_pos_p) / 2.0
else
break
end
end
end
return p_arr
end
def compare(n, p, p_arr, trials)
expected_bub = get_expected_bub(n, p)
#bub_totals = [0]*n
simbub_totals = [0]*n
trials.times do
simbub_trial = get_simbub(n, p_arr, 0)
#bub_trial = bub(n, p)
0.upto(n-1) do |i|
simbub_totals[i] += simbub_trial[i]
#bub_totals[i] += bub_trial[i]
end
end
puts " #: expbub | simbub | delta"
0.upto(n-1) do |i|
#b = bub_totals[i] / trials.to_f
b = expected_bub[i]
s = simbub_totals[i] / trials.to_f
puts "#{(i).to_s.rjust(4)}: #{b.round(2).to_s.rjust(7)} | #{s.round(2).to_s.rjust(7)} | #{(s-b).round(2).to_s.rjust(7)}"
end
end
def time_trials(n, p, seconds)
t = Time.now
bub_counter = 0
while Time.now < t + seconds do
get_bub(n, p)
bub_counter += 1
end
t = Time.now
p_arr = get_p_arr(n, p, 0.0001)
p_arr_seconds = Time.now - t
t = Time.now
simbub_counter = 0
while Time.now < t + seconds do
get_simbub(n, p_arr)
simbub_counter += 1
end
puts "Trial results (#{seconds} seconds): "
puts "Time to get p_arr for simbub: #{p_arr_seconds.round(2)}"
puts "bub runs: #{bub_counter}"
puts "simbub runs: #{simbub_counter}"
puts "ratio: #{(simbub_counter.to_f/bub_counter.to_f).round(2)}"
end
Errors vs expectation for n=100, p=0.5
compare(100, 0.5, p_arr, 10000)
#: expbub | simbub | delta
0: 10.27 | 10.23 | -0.04
1: 10.27 | 10.18 | -0.09
2: 10.33 | 10.16 | -0.16
3: 10.44 | 10.45 | 0.01
4: 10.61 | 10.66 | 0.05
5: 10.83 | 10.83 | -0.01
6: 11.11 | 11.1 | -0.02
7: 11.45 | 11.5 | 0.05
8: 11.84 | 11.92 | 0.08
9: 12.27 | 12.35 | 0.08
10: 12.76 | 12.78 | 0.02
11: 13.29 | 13.23 | -0.06
12: 13.87 | 13.72 | -0.15
13: 14.49 | 14.58 | 0.09
14: 15.15 | 15.14 | -0.01
15: 15.85 | 15.83 | -0.02
16: 16.58 | 16.51 | -0.06
17: 17.34 | 17.35 | 0.01
18: 18.13 | 18.26 | 0.13
19: 18.95 | 19.0 | 0.05
20: 19.79 | 19.75 | -0.04
21: 20.66 | 20.85 | 0.19
22: 21.54 | 21.7 | 0.16
23: 22.45 | 22.64 | 0.19
24: 23.36 | 23.49 | 0.13
25: 24.29 | 24.19 | -0.11
26: 25.24 | 25.17 | -0.07
27: 26.19 | 26.38 | 0.19
28: 27.15 | 27.16 | 0.01
29: 28.12 | 28.16 | 0.05
30: 29.09 | 28.99 | -0.1
31: 30.07 | 30.08 | 0.0
32: 31.05 | 31.19 | 0.14
33: 32.04 | 31.88 | -0.16
34: 33.03 | 33.07 | 0.03
35: 34.02 | 33.78 | -0.24
36: 35.02 | 34.97 | -0.05
37: 36.01 | 36.05 | 0.04
38: 37.01 | 37.0 | -0.01
39: 38.01 | 37.95 | -0.06
40: 39.0 | 38.94 | -0.07
41: 40.0 | 39.94 | -0.06
42: 41.0 | 41.01 | 0.0
43: 42.0 | 42.08 | 0.08
44: 43.0 | 42.87 | -0.13
45: 44.0 | 43.88 | -0.12
46: 45.0 | 44.99 | -0.02
47: 46.0 | 45.92 | -0.08
48: 47.0 | 46.8 | -0.2
49: 48.0 | 47.92 | -0.08
50: 49.0 | 49.01 | 0.01
51: 50.0 | 50.04 | 0.04
52: 51.0 | 51.11 | 0.11
53: 52.0 | 51.95 | -0.05
54: 53.0 | 53.08 | 0.08
55: 54.0 | 54.05 | 0.05
56: 55.0 | 54.95 | -0.05
57: 56.0 | 55.98 | -0.02
58: 57.0 | 57.13 | 0.13
59: 58.0 | 58.01 | 0.01
60: 59.0 | 59.11 | 0.11
61: 60.0 | 60.01 | 0.01
62: 61.0 | 61.02 | 0.02
63: 62.0 | 61.93 | -0.07
64: 63.0 | 63.05 | 0.05
65: 64.0 | 64.01 | 0.01
66: 65.0 | 65.0 | -0.0
67: 66.0 | 66.04 | 0.04
68: 67.0 | 67.11 | 0.11
69: 68.0 | 68.01 | 0.01
70: 69.0 | 69.03 | 0.03
71: 70.0 | 70.08 | 0.08
72: 71.0 | 70.96 | -0.04
73: 72.0 | 72.01 | 0.01
74: 73.0 | 72.95 | -0.05
75: 74.0 | 74.0 | -0.0
76: 75.0 | 74.99 | -0.01
77: 76.0 | 75.92 | -0.08
78: 77.0 | 76.98 | -0.02
79: 78.0 | 77.91 | -0.09
80: 79.0 | 79.05 | 0.05
81: 80.0 | 79.96 | -0.04
82: 81.0 | 81.0 | -0.0
83: 82.0 | 82.0 | -0.0
84: 83.0 | 82.98 | -0.02
85: 84.0 | 84.06 | 0.06
86: 85.0 | 84.99 | -0.01
87: 86.0 | 85.97 | -0.03
88: 87.0 | 87.0 | -0.0
89: 88.0 | 88.04 | 0.04
90: 89.0 | 88.95 | -0.05
91: 90.0 | 90.03 | 0.03
92: 91.0 | 91.01 | 0.01
93: 92.0 | 91.97 | -0.03
94: 93.0 | 92.98 | -0.02
95: 94.0 | 94.01 | 0.01
96: 95.0 | 94.99 | -0.01
97: 96.0 | 95.97 | -0.03
98: 97.0 | 97.03 | 0.03
99: 98.0 | 98.0 | -0.0
Time Trials: (simbub runs in 60s) / (bubble_shuffle runs in 60s)
p=0.01 p=0.25 p=0.50 p=0.75 p=0.99
n = 100 10.85 10.17 10.75 9.53 4.16
n = 200 22.98 18.11 17.30 13.46 5.33
n = 300 27.70 25.03 23.88 18.11 5.94
n = 400 41.09 29.46 27.11 21.81 6.92
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With