Rust provides a trim method for strings: str.trim() removing leading and trailing whitespace. I want to have a method that does the same for bytestrings. It should take a Vec<u8>
and remove leading and trailing whitespace (space, 0x20 and htab, 0x09).
Writing a trim_left()
is easy, you can just use an iterator with skip_while()
: Rust Playground
fn main() {
let a: &[u8] = b" fo o ";
let b: Vec<u8> = a.iter().map(|x| x.clone()).skip_while(|x| x == &0x20 || x == &0x09).collect();
println!("{:?}", b);
}
But to trim the right characters I would need to look ahead if no other letter is in the list after whitespace was found.
Vec<u8> is like Box<[u8]> , except it additionally stores a "capacity" count, making it three machine words wide. Separately stored capacity allows for efficient resizing of the underlying array. It's the basis for String .
A contiguous growable array type, written as Vec<T> , short for 'vector'.
In Rust, there are several ways to initialize a vector. In order to initialize a vector via the new() method call, we use the double colon operator: let mut vec = Vec::new();
Here's an implementation that returns a slice, rather than a new Vec<u8>
, as str::trim()
does. It's also implemented on [u8]
, since that's more general than Vec<u8>
(you can obtain a slice from a vector cheaply, but creating a vector from a slice is more costly, since it involves a heap allocation and a copy).
trait SliceExt {
fn trim(&self) -> &Self;
}
impl SliceExt for [u8] {
fn trim(&self) -> &[u8] {
fn is_whitespace(c: &u8) -> bool {
*c == b'\t' || *c == b' '
}
fn is_not_whitespace(c: &u8) -> bool {
!is_whitespace(c)
}
if let Some(first) = self.iter().position(is_not_whitespace) {
if let Some(last) = self.iter().rposition(is_not_whitespace) {
&self[first..last + 1]
} else {
unreachable!();
}
} else {
&[]
}
}
}
fn main() {
let a = b" fo o ";
let b = a.trim();
println!("{:?}", b);
}
If you really need a Vec<u8>
after the trim()
, you can just call into()
on the slice to turn it into a Vec<u8>
.
fn main() {
let a = b" fo o ";
let b: Vec<u8> = a.trim().into();
println!("{:?}", b);
}
This is a much simpler version than the other answers.
pub fn trim_ascii_whitespace(x: &[u8]) -> &[u8] {
let from = match x.iter().position(|x| !x.is_ascii_whitespace()) {
Some(i) => i,
None => return &x[0..0],
};
let to = x.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
&x[from..=to]
}
Weird that this isn't in the standard library. I would have thought it was a common task.
Anyway here it is as a complete file/trait (with tests!) that you can copy/paste.
use std::ops::Deref;
/// Trait to allow trimming ascii whitespace from a &[u8].
pub trait TrimAsciiWhitespace {
/// Trim ascii whitespace (based on `is_ascii_whitespace()`) from the
/// start and end of a slice.
fn trim_ascii_whitespace(&self) -> &[u8];
}
impl<T: Deref<Target=[u8]>> TrimAsciiWhitespace for T {
fn trim_ascii_whitespace(&self) -> &[u8] {
let from = match self.iter().position(|x| !x.is_ascii_whitespace()) {
Some(i) => i,
None => return &self[0..0],
};
let to = self.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
&self[from..=to]
}
}
#[cfg(test)]
mod test {
use super::TrimAsciiWhitespace;
#[test]
fn basic_trimming() {
assert_eq!(b" A ".trim_ascii_whitespace(), b"A");
assert_eq!(b" AB ".trim_ascii_whitespace(), b"AB");
assert_eq!(b"A ".trim_ascii_whitespace(), b"A");
assert_eq!(b"AB ".trim_ascii_whitespace(), b"AB");
assert_eq!(b" A".trim_ascii_whitespace(), b"A");
assert_eq!(b" AB".trim_ascii_whitespace(), b"AB");
assert_eq!(b" A B ".trim_ascii_whitespace(), b"A B");
assert_eq!(b"A B ".trim_ascii_whitespace(), b"A B");
assert_eq!(b" A B".trim_ascii_whitespace(), b"A B");
assert_eq!(b" ".trim_ascii_whitespace(), b"");
assert_eq!(b" ".trim_ascii_whitespace(), b"");
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With