So I talked with some colleagues and the problem I currently have is actually quite challenging. The context behind this problem has to do with mass spectrometry and assigning structure to different peaks that the software gives.
But to break it down into a optimization problem, I have a certain target value. I also have a list of various inputs whose sum I want to be as close as possible to the target as possible.
So as an example, here is what I have.
List of inputs: [18.01, 42.01, 132.04, 162.05, 203.08, 176.03]
Target value: 1800.71
I want to find all the possible combinations of the inputs listed whose sum is within 0.5 of 1800.71. So the sum can be anywhere between 1800.21 and 1801.21.
I already know two inputs could be:
[18.01, 162.05, 162.05, 162.05, 162.05, 162.05, 162.05, 162.05, 162.05, 162.05, 162.05, 162.05] **which gives a sum of 1800.59**
and
[18.01, 18.01, 203.08, 203.08, 203.08, 162.05, 203.08, 18.01, 18.01, 18.01, 18.01, 18.01, 18.01, 18.01, 18.01, 18.01, 18.01, 42.01, 162.05, 203.08, 203.08] **which gives a sum 1800.71**
I'm NOT looking to find the combination that gets me as close as possible to the target value; I am interested in ALL the possible combinations that are within 0.5 of the target value.
If anyone could help me with this problem I would greatly appreciate it!
Instead of allowing multiple values, it would be a lot faster to just compute an integer factor for every value.
For your problem, I get 988 results.
import math
import time
def combinator(tolerance, target, inputs):
# Special case for inputs with one element, speeds up computation a lot
if len(inputs) == 1:
number = inputs[0]
result_min = int(math.ceil((target-tolerance)/number))
result_max = int(math.floor((target+tolerance)/number))
for factor in range(result_min, result_max+1):
yield [factor]
return
# Special case for no inputs, just to prevent infinite recursion
if not inputs:
return
number = inputs[-1]
max_value = int(math.floor((target + tolerance)/number))
for i in range(max_value+1):
for sub_factors in combinator(tolerance, target-i*number, inputs[:-1]):
sub_factors.append(i)
yield sub_factors
def main():
inputs = [18.01, 42.01, 132.04, 162.05, 203.08, 176.03]
target = 1800.71
tolerance = 0.5
t_start = time.perf_counter()
results = list(combinator(tolerance, target, inputs))
t_end = time.perf_counter()
for result in results:
result_str = ""
result_value = 0
for factor, value in zip(result, inputs):
if not factor:
continue
if result_str != "":
result_str += " + "
result_str += "{}* {}".format(factor, value)
result_value += factor*value
print("{:.2f}".format(result_value) + " =\t[" + result_str + "]")
print("{} results found!".format(len(results)))
print("Took {:.2f} milliseconds.".format((t_end-t_start)*1000))
if __name__ == "__main__":
main()
1801.00 = [100* 18.01]
1800.96 = [93* 18.01 + 3* 42.01]
1800.92 = [86* 18.01 + 6* 42.01]
...
1800.35 = [5* 18.01 + 3* 42.01 + 9* 176.03]
1800.33 = [2* 42.01 + 1* 132.04 + 9* 176.03]
1800.35 = [3* 18.01 + 1* 162.05 + 9* 176.03]
988 results found!
Took 11.48 milliseconds.
I also reimplemented the same algorithm in Rust.
Performance on your problem:
Here is the code:
use std::time::Instant;
fn combinator(tolerance : f32, target: f32, inputs: &[f32]) -> Vec<Vec<i32>>{
let number = match inputs.last() {
Some(i) => i,
None => return vec![]
};
if inputs.len() == 1 {
let result_min = ((target-tolerance)/number).ceil() as i32;
let result_max = ((target+tolerance)/number).floor() as i32;
return (result_min..=result_max).map(|x| vec![x]).collect();
}
let max_value = ((target + tolerance)/number).floor() as i32;
let mut results = vec![];
for i in 0..=max_value {
for mut sub_factors in combinator(tolerance, target - i as f32 * number, &inputs[..inputs.len()-1]) {
sub_factors.push(i);
results.push(sub_factors);
}
}
results
}
fn print_result(factors: &[i32], values: &[f32]){
let sum : f32 = factors.iter()
.zip(values.iter())
.map(|(factor,value)| *factor as f32 * *value)
.sum();
println!("{:.2} =\t[{}]", sum,
factors.iter()
.zip(values.iter())
.filter(|(factor, _value)| **factor > 0)
.map(|(factor, value)| format!("{}* {}", factor, value))
.collect::<Vec<String>>()
.join(", "));
}
fn main() {
let inputs = vec![18.01, 42.01, 132.04, 162.05, 203.08, 176.03];
let target = 1800.71;
let tolerance = 0.5;
let t_start = Instant::now();
let results = combinator(tolerance, target, &inputs);
let duration = t_start.elapsed().as_micros() as f64;
for result in &results {
print_result(&result, &inputs);
}
println!("{} results found!", results.len());
println!("Took {} milliseconds", duration / 1000.0);
}
1801.00 = [100* 18.01]
1800.96 = [93* 18.01, 3* 42.01]
1800.92 = [86* 18.01, 6* 42.01]
...
1800.35 = [5* 18.01, 3* 42.01, 9* 176.03]
1800.33 = [2* 42.01, 1* 132.04, 9* 176.03]
1800.35 = [3* 18.01, 1* 162.05, 9* 176.03]
988 results found!
Took 0.656 milliseconds
Also, just for fun, those are the exact solutions to your problem. There are 5 of them.
1800.71 = [12* 18.01, 1* 42.01, 2* 162.05, 6* 203.08]
1800.71 = [13* 18.01, 2* 42.01, 2* 132.04, 6* 203.08]
1800.71 = [16* 18.01, 7* 42.01, 6* 203.08]
1800.71 = [52* 18.01, 1* 42.01, 1* 132.04, 1* 162.05, 3* 176.03]
1800.71 = [54* 18.01, 4* 42.01, 1* 132.04, 3* 176.03]
Another answer in the same vein as the existing fine answers. I found it simpler to use a range instead of a target + tolerance, and use a trivial (unoptimized) recursive solution, which seems fast enough to find the ~1000 answers to your use case.
Changing to use generators/yield or to optimize the single value case did not change the time taken for all results, though you may find it useful if you have a pipeline.
def fuzzy_coins(vals, lower, upper):
'''
vals: [Positive]
lower: Positive
upper: Positive
return: [[Int]]
Returns a list of coefficients for vals such that the dot
product of vals and return falls between lower and upper.
'''
ret = []
if not vals:
if lower <= 0 <= upper:
ret.append(())
else:
val = vals[-1]
for i in xrange(int(upper / val) + 1):
for sub in fuzzy_coins(vals[:-1], lower, upper):
ret.append(sub + (i,))
lower -= val
upper -= val
return ret
Even so, this takes ~100ms in python 2.7 and 3.6
[('1800.33', (0, 2, 1, 0, 0, 9)),
('1800.35', (3, 0, 0, 1, 0, 9)),
('1800.35', (5, 3, 0, 0, 0, 9)),
('1800.38', (0, 10, 0, 2, 0, 6)),
('1800.38', (1, 11, 2, 0, 0, 6)),
...
('1800.92', (86, 6, 0, 0, 0, 0)),
('1800.94', (88, 2, 1, 0, 0, 0)),
('1800.96', (91, 0, 0, 1, 0, 0)),
('1800.96', (93, 3, 0, 0, 0, 0)),
('1801.00', (100, 0, 0, 0, 0, 0))]
Took 0.10885s to get 988 results
e.g. usage:
from __future__ import print_function
import pprint
import time
def main():
vals = [18.01, 42.01, 132.04, 162.05, 203.08, 176.03]
target = 1800.71
fuzz = .5
lower = target - fuzz
upper = target + fuzz
start = time.time()
coefs = fuzzy_coins(vals, lower, upper)
end = time.time()
pprint.pprint(sorted(
('%.2f' % sum(c * v for c, v in zip(coef, vals)), coef)
for coef in coefs
))
print('Took %.5fs to get %d results' % (end - start, len(coefs)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With