Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to remove elements of Vec depending on other elements of the same Vec

I have a vector of sets and I want to remove all sets that are subsets of other sets in the vector. Example:

a = {0, 3, 5}
b = {0, 5}
c = {0, 2, 3}

In this case I would like to remove b, because it's a subset of a. I'm fine with using a "dumb" n² algorithm.

Sadly, it's pretty tricky to get it working with the borrow checker. The best I've come up with is (Playground):

let mut v: Vec<HashSet<u8>> = vec![];

let mut to_delete = Vec::new();
for (i, set_a) in v.iter().enumerate().rev() {
    for set_b in &v[..i] {
        if set_a.is_subset(&set_b) {
            to_delete.push(i);
            break;
        }
    }
}

for i in to_delete {
    v.swap_remove(i);
}

(note: the code above is not correct! See comments for further details)

I see a few disadvantages:

  • I need an additional vector with additional allocations
  • Maybe there are more efficient ways than calling swap_remove often
  • If I need to preserve order, I can't use swap_remove, but have to use remove which is slow

Is there a better way to do this? I'm not just asking about my use case, but about the general case as it's described in the title.

like image 423
Lukas Kalbertodt Avatar asked Jun 19 '16 10:06

Lukas Kalbertodt


People also ask

How do you remove the first element of a vector in C++?

To remove first element of a vector, you can use erase() function. Pass iterator to first element of the vector as argument to erase() function.

How do you remove elements from a vector in Rust?

To remove all elements from a vector in Rust, use . retain() method to keep all elements the do not match. let mut v = vec![ "A", "warm", "fall", "warm", "day"]; let elem = "warm"; // element to remove v.

How do you remove a specific element from a vector?

The erase() function can remove an element from the beginning, within, or end of the vector. In order to remove all the elements from the vector, using erase(), the erase() function has to be repeated the number of times there are elements, beginning from the first element.

How do I remove one element from a vector in R?

To delete an item at specific index from R Vector, pass the negated index as a vector in square brackets after the vector. We can also delete multiple items from a vector, based on index.


3 Answers

Here is a solution that does not make additional allocations and preserves the order:

fn product_retain<T, F>(v: &mut Vec<T>, mut pred: F)
    where F: FnMut(&T, &T) -> bool
{
    let mut j = 0;
    for i in 0..v.len() {
        // invariants:
        // items v[0..j] will be kept
        // items v[j..i] will be removed
        if (0..j).chain(i + 1..v.len()).all(|a| pred(&v[i], &v[a])) {
            v.swap(i, j);
            j += 1;
        }
    }
    v.truncate(j);
}

fn main() {
    // test with a simpler example
    // unique elements
    let mut v = vec![1, 2, 3];
    product_retain(&mut v, |a, b| a != b);
    assert_eq!(vec![1, 2, 3], v);

    let mut v = vec![1, 3, 2, 4, 5, 1, 2, 4];
    product_retain(&mut v, |a, b| a != b);
    assert_eq!(vec![3, 5, 1, 2, 4], v);
}

This is a kind of partition algorithm. The elements in the first partition will be kept and in the second partition will be removed.

like image 180
malbarbo Avatar answered Oct 22 '22 12:10

malbarbo


You can use a while loop instead of the for:

use std::collections::HashSet;

fn main() {
    let arr: &[&[u8]] = &[
        &[3],
        &[1,2,3],
        &[1,3],
        &[1,4],
        &[2,3]
    ];

    let mut v:Vec<HashSet<u8>> = arr.iter()
        .map(|x| x.iter().cloned().collect())
        .collect();

    let mut pos = 0;
    while pos < v.len() {
        let is_sub = v[pos+1..].iter().any(|x| v[pos].is_subset(x)) 
            || v[..pos].iter().any(|x| v[pos].is_subset(x));

        if is_sub {
            v.swap_remove(pos);
        } else {
            pos+=1;
        }
    }
    println!("{:?}", v);
}

There are no additional allocations.


To avoid using remove and swap_remove, you can change the type of vector to Vec<Option<HashSet<u8>>>:

use std::collections::HashSet;

fn main() {
    let arr: &[&[u8]] = &[
        &[3],
        &[1,2,3],
        &[1,3],
        &[1,4],
        &[2,3]
    ];

    let mut v:Vec<Option<HashSet<u8>>> = arr.iter()
        .map(|x| Some(x.iter().cloned().collect()))
        .collect();

    for pos in 0..v.len(){
        let is_sub = match v[pos].as_ref() {
            Some(chk) => 
                v[..pos].iter().flat_map(|x| x).any(|x| chk.is_subset(x)) 
                ||  v[pos+1..].iter().flat_map(|x| x).any(|x| chk.is_subset(x)),
            None => false,
        };

        if is_sub { v[pos]=None };//Replace with None instead remove

    }
    println!("{:?}", v);//[None, Some({3, 2, 1}), None, Some({1, 4}), None]
}
like image 45
aSpex Avatar answered Oct 22 '22 13:10

aSpex


  • I need an additional vector with additional allocations

I wouldn't worry about that allocation, since the memory and runtime footprint of that allocation will be really small compared to the rest of your algorithm.

  • Maybe there are more efficient ways than calling swap_remove often.
  • If I need to preserve order, I can't use swap_remove, but have to use remove which is slow

I'd change to_delete from Vec<usize> to Vec<bool> and just mark whether a particular hashmap should be removed. You can then use the Vec::retain, which conditionaly removes elements while preserving order. Unfortunately, this function doesn't pass the index to the closure, so we have to create a workaround (playground):

let mut to_delete = vec![false; v.len()];
for (i, set_a) in v.iter().enumerate().rev() {
    for set_b in &v[..i] {
        if set_a.is_subset(&set_b) {
            to_delete[i] = true;
        }
    }
}

{
    // This assumes that retain checks the elements in the order.
    let mut i = 0;
    v.retain(|_| {
        let ret = !to_delete[i];
        i += 1;
        ret
    });
}

If your hashmap has a special value which can never occur under normal conditions, you can use it to mark a hashmap as "to delete", and then check that condition in retain (it would require changing the outer loop from iterator-based to range-based though).


Sidenote (if that HashSet<u8> is not just a toy example): More eficient way to store and compare sets of small integers would be to use a bitset.

like image 1
krdln Avatar answered Oct 22 '22 11:10

krdln