I have many 4KiB buffers, which have a 50% chance of containing only zero values. The non-zero buffers will typically have a non-zero byte early in the buffer. <pre class="prettyprint"><code>fn is_zero(buf: &Vec<u8>) -> bool { for byte in buf.into_iter() { if *byte != 0 { return false; } } return true; } </code></pre> Is this a performant way of checking in Rust, with <code>--release</code>? (I am processing many GBs of data.) (In the C version, I cast the buffer to <code>unsigned long long</code> before checking. That probably wasn't the best I could do, given SSE etc.)

You an use <code>align_to</code> to convert the slice of <code>u8</code> into a slice of <code>u128</code>, making the comparison more efficient: <pre class="prettyprint lang-rust prettyprint-override"><code>fn is_zero(buf: &[u8]) -> bool { let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() }; prefix.iter().all(|&x| x == 0) && suffix.iter().all(|&x| x == 0) && aligned.iter().all(|&x| x == 0) } </code></pre> Running a simple benchmark on my machine shows 16x performance gains! <pre class="prettyprint lang-rust prettyprint-override"><code>#![feature(test)] extern crate test; fn v() -> Vec<u8> { std::iter::repeat(0).take(1000000).collect() } fn is_zero(buf: &[u8]) -> bool { buf.into_iter().all(|&b| b == 0) } fn is_zero_aligned(buf: &[u8]) -> bool { let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() }; prefix.iter().all(|&x| x == 0) && suffix.iter().all(|&x| x == 0) && aligned.iter().all(|&x| x == 0) } #[bench] fn bench_is_zero(b: &mut test::Bencher) { let v = test::black_box(v()); b.iter(|| is_zero(&v[..])) } #[bench] fn bench_is_zero_aligned(b: &mut test::Bencher) { let v = test::black_box(v()); b.iter(|| is_zero_aligned(&v[..])) } </code></pre> <pre class="prettyprint lang-sh prettyprint-override"><code>running 2 tests test tests::bench_is_zero ... bench: 455,975 ns/iter (+/- 414) test tests::bench_is_zero_aligned ... bench: 28,615 ns/iter (+/- 116) </code></pre> Depending on your machine, different integer types (<code>u64</code>) may yield better performance. Thanks to @Globi on the Rust discord server for the idea

Found 4x speedup on my laptop with <code>byteorder</code>, by reading <code>u64</code> at a time, in native endian. <h3>lib.rs</h3> <pre class="prettyprint lang-rust prettyprint-override"><code>extern crate byteorder; use byteorder::{NativeEndian, ReadBytesExt}; use std::io::Cursor; pub fn one(buf: &[u8]) -> bool { buf.into_iter().all(|&byte| byte == 0) } pub fn two(buf: &[u8]) -> bool { let mut cur = Cursor::new(buf); while let Ok(val) = cur.read_u64::<NativeEndian>() { if val != 0 { return false; } } while let Ok(val) = cur.read_u8() { if val != 0 { return false; } } true } </code></pre> <h3>benches/benches.rs</h3> <pre class="prettyprint lang-rust prettyprint-override"><code>#![feature(test)] extern crate test; extern crate zero_slice_8; use zero_slice_8::{one, two}; fn v() -> Vec<u8> { let mut result = vec![]; for _ in 0..100000 { result.push(0); } result } #[bench] fn bench_one(b: &mut test::Bencher) { let v = v(); b.iter(|| one(&v[..])) } #[bench] fn bench_two(b: &mut test::Bencher) { let v = v(); b.iter(|| two(&v[..])) } </code></pre>

The following function is pure save Rust: <pre class="prettyprint"><code>fn is_zero ( slice : &[u8] ) -> bool { for i in (0..slice.len()).step_by(16) { if slice.len() - i >= 16 { let arr : [u8; 16] = slice[i..i+16].try_into().expect("this should always succeed"); if u128::from_be_bytes(arr) != 0 { return false; } } else { for i in i..slice.len() { if slice[i] != 0 { return false; } } } } return true; } </code></pre> Specifically, it uses the <code>u128::from_be_bytes</code> function to convert a <code>[u8; 16]</code> array to a <code>u128</code> as a non-op, and uses the <code>TryInto</code> trait to turn a <code>[u8]</code> of appropriate length into a <code>[u8; 16]</code> — the rest is fairly trivial. It is possible to manually unroll the inner loop to convert it, but I doubt this will be a significant performance bottleneck that the <code>u8</code>s that form the tail of the list that aren't cleanly 16 bytes would work. Depending on the processor, using <code>u64</code> or even <code>u32</code> might be faster, one would have to profile that for oneself.

Checking a Vec<u8> to see if it's all zero?

I have many 4KiB buffers, which have a 50% chance of containing only zero values. The non-zero buffers will typically have a non-zero byte early in the buffer.

fn is_zero(buf: &Vec<u8>) -> bool {
    for byte in buf.into_iter() {
        if *byte != 0 {
            return false;
        }
    }
    return true;
}

Is this a performant way of checking in Rust, with --release? (I am processing many GBs of data.)

(In the C version, I cast the buffer to unsigned long long before checking. That probably wasn't the best I could do, given SSE etc.)

How do you check if a vector has all zeros?

These statements yield a true value only if all bits contain the '0' value: These statements yield a true value only if all bits contain the '1' value: One way to check if a vector of any length is all zeros, is to convert it to an unsigned value and then compare it to its integer equivalent.

How do I check if all bits in a vector are exclusive?

The first parameter is the vector to check, and the second one is the value to check if all bits contain exclusively. The function call above only returns true if all bits in the vector have the '0' value:

How do you check if a vector is logical 1?

The simplest possible circuit for checking that all the bits in a vector are logical '1' values is to use an AND gate. The truth table for an AND gate states that it will output a true value if and only if all the input values evaluate to true. By anding together all the bits in the vector, we can check if all of them are '1'.

How do you test an array for nonzero elements?

You can also test the array for elements that are greater than zero. The syntax A (:) turns the elements of A into a single column vector, so you can use this type of statement on an array of any size. Create a 3-by-3 matrix. Test the rows of A for all nonzero elements by specifying dim = 2.

You an use align_to to convert the slice of u8 into a slice of u128, making the comparison more efficient:

fn is_zero(buf: &[u8]) -> bool {
    let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() };

    prefix.iter().all(|&x| x == 0)
        && suffix.iter().all(|&x| x == 0)
        && aligned.iter().all(|&x| x == 0)
}

Running a simple benchmark on my machine shows 16x performance gains!

#![feature(test)]
extern crate test;

fn v() -> Vec<u8> {
    std::iter::repeat(0).take(1000000).collect()
}

fn is_zero(buf: &[u8]) -> bool {
    buf.into_iter().all(|&b| b == 0)
}

fn is_zero_aligned(buf: &[u8]) -> bool {
    let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() };

    prefix.iter().all(|&x| x == 0)
        && suffix.iter().all(|&x| x == 0)
        && aligned.iter().all(|&x| x == 0)
}

#[bench]
fn bench_is_zero(b: &mut test::Bencher) {
    let v = test::black_box(v());
    b.iter(|| is_zero(&v[..]))
}

#[bench]
fn bench_is_zero_aligned(b: &mut test::Bencher) {
    let v = test::black_box(v());
    b.iter(|| is_zero_aligned(&v[..]))
}

running 2 tests
test tests::bench_is_zero         ... bench:     455,975 ns/iter (+/- 414)
test tests::bench_is_zero_aligned ... bench:      28,615 ns/iter (+/- 116)

Depending on your machine, different integer types (u64) may yield better performance.

Thanks to @Globi on the Rust discord server for the idea

Found 4x speedup on my laptop with byteorder, by reading u64 at a time, in native endian.

lib.rs

extern crate byteorder;

use byteorder::{NativeEndian, ReadBytesExt};
use std::io::Cursor;

pub fn one(buf: &[u8]) -> bool {
    buf.into_iter().all(|&byte| byte == 0)
}

pub fn two(buf: &[u8]) -> bool {
    let mut cur = Cursor::new(buf);
    while let Ok(val) = cur.read_u64::<NativeEndian>() {
        if val != 0 {
            return false;
        }
    }
    while let Ok(val) = cur.read_u8() {
        if val != 0 {
            return false;
        }
    }
    true
}

benches/benches.rs

#![feature(test)]

extern crate test;
extern crate zero_slice_8;

use zero_slice_8::{one, two};

fn v() -> Vec<u8> {
    let mut result = vec![];
    for _ in 0..100000 {
        result.push(0);
    }
    result
}

#[bench]
fn bench_one(b: &mut test::Bencher) {
    let v = v();
    b.iter(|| one(&v[..]))
}

#[bench]
fn bench_two(b: &mut test::Bencher) {
    let v = v();
    b.iter(|| two(&v[..]))
}

The following function is pure save Rust:

fn is_zero ( slice : &[u8] ) -> bool {
    for i in (0..slice.len()).step_by(16) {
        if slice.len() - i >= 16 {
            let arr : [u8; 16] = slice[i..i+16].try_into().expect("this should always succeed");
            if u128::from_be_bytes(arr) != 0 {
                return false;
            }
        } else {
            for i in i..slice.len() {
                if slice[i] != 0 {
                    return false;
                }
            }
        }
    }
    return true;
}

Specifically, it uses the u128::from_be_bytes function to convert a [u8; 16] array to a u128 as a non-op, and uses the TryInto trait to turn a [u8] of appropriate length into a [u8; 16] — the rest is fairly trivial. It is possible to manually unroll the inner loop to convert it, but I doubt this will be a significant performance bottleneck that the u8s that form the tail of the list that aren't cleanly 16 bytes would work.

Depending on the processor, using u64 or even u32 might be faster, one would have to profile that for oneself.

Checking a Vec<u8> to see if it's all zero?

Tags:

vector

rust

vec

fadedbee

People also ask

Video Answer

3 Answers

Ibraheem Ahmed

lib.rs

benches/benches.rs

Peter Blackson

Zorf

Recent Activity

Donate For Us

Checking a Vec<u8> to see if it's all zero?

Tags:

vector

rust

vec

fadedbee

People also ask

Video Answer

3 Answers

Ibraheem Ahmed

lib.rs

benches/benches.rs

Peter Blackson

Zorf

Related questions

Recent Activity

Donate For Us