Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ remove items from vector whose values are within an interval

Is there a convenient way to remove items from a vector (or another stl container) whose values are within an certain interval?

So for example: I got an vector with float values

1.1 1.3 2.2 3.2 4.1 5.2 5.1 1.1 8.0 2.1

and a delta of 0.2, which should lead to the following result

1.1 2.2 3.2 4.1 5.1 8.0

thus removing all "duplicate" items within the delta and keep one of the values within the range. It can be assumed that the values are "clustered", where the difference between these is more than 3*delta. Only one value (the mean) of the cluster should be kept, all the others from the cluster should be removed.

For sure, it is possible to iterate with nested loops, but this seems quite complicated, because of the changing iterators, so I thought of a more convenient way. I found remove_if for example, but this function cannot "compare".

Thanks for suggestions.

like image 399
the_max Avatar asked Dec 12 '22 21:12

the_max


2 Answers

You can use std::unique with a predicate:

template <typename It, typename Predicate>
It unique(It first, It last, Predicate pred);

The most used form of std::unique takes no predicate and just removes duplicates from a sequence. But you can write one that implements your filter (in your case, compares two values using a gap) and you are set. Something like:

bool CompareWithGap(double a, double b)
{
  return abs(a - b) <= 0.2;
}

And use it to call std::unique:

auto it = std::unique(v.begin(), v.end(), CompareWithGap);

Where v is your vector (or any other sequence).

EDIT: Forgot to mention that you need to sort your sequence before using std::unique. If this is not an option, you have to write your own algorithm.

2nd EDIT: For completion, as pointed out by Christian Rau in his comment, you can now erase the items removed from your sequence by using the erase method in it:

v.erase(it, v.end());
like image 120
Gorpik Avatar answered May 21 '23 13:05

Gorpik


You question is actually quite complicated because 'one of the values within the range' is not well defined. For instance given

1.1 1.2 1.3

which do you want to keep? Presumably the first one, i.e. 1.1. OK now how about

0.9 1.1 1.3

Following the first one rule, we keep 0.9 and 1.3, But we could have kept just 1.1 instead. The question I think is are 0.9 and 1.3 'duplicates' or not? I don't think you've defined this well enough.

What about this case, 1.1 1.2 1.3 1.4 1.5 1.6? All values are within 0.2 of one other value but not all are within 0.2 of every other value. So are they all duplicates? Or do you need to split them up? If so how should they be split up, pick 1.1 and 1.4 perhaps?

So I think until you define your problem a bit more precisely it's not really possible for you to write the code or for us to help you.

You might want to look at the disjoint-set data structure. Depending on precisely what you are trying to do it may be the most efficient way to solve this problem.

like image 35
john Avatar answered May 21 '23 13:05

john