Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Rust contain a way to directly check whether or not one vector is a "substring" of another?

Tags:

vector

rust

You can do this with a String using contains which searches for a pattern, but Vec::contains is for a single element.

The only way I've been able to do this is by directly implementing some kind of substring function, but I'm sort of hoping there's a built-in way.

let vec1 = vec![1, 2, 3, 4, 5];
let vec2 = vec![2, 3]; // vec2 IS a substring of vec1
let vec3 = vec![1, 5]; // vec3 is NOT a substring of vec3

fn is_subvec(mainvec: &Vec<i32>, subvec: &Vec<i32>) -> bool {
    if subvec.len() == 0 { return true; }
    if mainvec.len() == 0 { return false; }

    'outer: for i in 0..mainvec.len() {
        for j in 0..subvec.len() {
            if mainvec[i+j] != subvec[j] {
                continue 'outer;
            }
        }
        return true;
    }
    return false;
}

println!("should be true: {}", is_subvec(&vec1, &vec2));
println!("should be false: {}", is_subvec(&vec1, &vec3));

I've seen How can I find a subsequence in a &[u8] slice?, but that's specifically for u8 and I want something that applies regardless of the type in the Vec.

like image 568
firechant Avatar asked Oct 31 '17 19:10

firechant


2 Answers

Rust doesn't include this in the standard library.

In general, this is the substring search problem which we can define over arbitrary alphabets. Depending on what properties we have available (only comparable, or also orderable) determines what algorithms we can use.

The benefit of using a substring search algorithm is that the function performs reasonably well for all inputs. The brute force search solution has a worst case that takes a time that is quadratic in the size of the input.

The “alphabet” of i32 values is orderable, so the Two Way algorithm (that the Rust standard library uses in str::find(&str) internally) could be adapted to implement this.

One algorithm which works for all equality comparable alphabets is Knuth-Morris-Pratt. It requires preprocessing the pattern we are searching for and requires space proportional to the pattern's length. It is also quite simple to implement.

I've written an implementation of the algorithm for generic elements for Rust @ bluss/knuth-morris-pratt, which as of this writing at least, is not published as a crate.


Well, christ. You might have nerd-sniped me pretty hard. I spent an inordinate amount of time researching algorithms for this that use no more than T: Eq and no more than constant space (meaning Rust core compatible). As of this writing, it's a crate you can use: galil-seiferas.

like image 125
bluss Avatar answered Sep 18 '22 23:09

bluss


To my knowledge, there's no function or method in the standard library to check if a slice is a subslice of another slice.

We can generalize and simplify your algorithm to (playground):

fn is_sub<T: PartialEq>(mut haystack: &[T], needle: &[T]) -> bool {
    if needle.len() == 0 { return true; }
    while !haystack.is_empty() {
        if haystack.starts_with(needle) { return true; }
        haystack = &haystack[1..];
    }
    false
}

fn main() {
    let vec1 = vec![1, 2, 3, 4, 5];
    let vec2 = vec![2, 3]; // vec2 IS a substring of vec1
    let vec3 = vec![1, 5]; // vec3 is NOT a substring of vec3

    println!("should be true: {}", is_sub(&vec1, &vec2));
    println!("should be false: {}", is_sub(&vec1, &vec3));
}

The logic is that we check if the needle slice is a prefix of the haystack slice, if it is not, we "remove" one element from the haystack. If the haystack ends up empty in the end, it is not a subslice.

With the help of windows we can shorten is_sub further with an approach from functional programming:

fn is_sub<T: PartialEq>(haystack: &[T], needle: &[T]) -> bool {
    haystack.windows(needle.len()).any(|c| c == needle)
}

Notably, these aren't the fastest algorithms since they have a worst case complexity of O(n^2), but they work for all T: PartialEq. Algorithms such as Knuth-Morris-Pratt can be used to speed this up, but for small inputs O(n^2) may be better due to constant factors.

We can improve upon this situation by using a trait in conjunction with specialization (which requires nightly):

#![feature(specialization)]

pub trait HasPart {
    fn has_part(&self, rhs: &Self) -> bool;
}

impl<T: PartialEq> HasPart for [T] {
    default fn has_part(&self, rhs: &Self) -> bool {
        self.windows(rhs.len()).any(|curr| curr == rhs)
    }
}

impl HasPart for [u8] {
    fn has_part(&self, rhs: &Self) -> bool {
        unimplemented!() // use Knuth-Morris-Pratt
    }
}

fn main() {
    let vec1 = vec![1, 2, 3, 4, 5];
    let vec2 = vec![2, 3]; // vec2 IS a substring of vec1
    let vec3 = vec![1, 5]; // vec3 is NOT a substring of vec3

    println!("should be true: {}", vec1.has_part(&vec2));
    println!("should be false: {}", vec1.has_part(&vec3));
}
like image 22
Centril Avatar answered Sep 18 '22 23:09

Centril