Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Rust provide a way to parse integer numbers directly from ASCII data in byte (u8) arrays?

Tags:

string

rust

Rust has FromStr, however as far as I can see this only takes Unicode text input. Is there an equivalent to this for [u8] arrays?

By "parse" I mean take ASCII characters and return an integer, like C's atoi does.

Or do I need to either...

  • Convert the u8 array to a string first, then call FromStr.
  • Call out to libc's atoi.
  • Write an atoi in Rust.

In nearly all cases the first option is reasonable, however there are cases where files maybe be very large, with no predefined encoding... or contain mixed binary and text, where its most straightforward to read integer numbers as bytes.

like image 644
ideasman42 Avatar asked Sep 09 '16 05:09

ideasman42


People also ask

What is u8 in Rust?

u8 : The 8-bit unsigned integer type. u16 : The 16-bit unsigned integer type. u32 : The 32-bit unsigned integer type. u64 : The 64-bit unsigned integer type.

How do you convert a string from u8 to rust?

Function core::str::from_utf81.0. 0 [−] [src] Converts a slice of bytes to a string slice. A string slice ( &str ) is made of bytes ( u8 ), and a byte slice ( &[u8] ) is made of bytes, so this function converts between the two. Not all byte slices are valid string slices, however: &str requires that it is valid UTF-8.


2 Answers

No, the standard library has no such feature, but it doesn't need one.

As stated in the comments, the raw bytes can be converted to a &str via:

  1. str::from_utf8
  2. str::from_utf8_unchecked

Neither of these perform extra allocation. The first one ensures the bytes are valid UTF-8, the second does not. Everyone should use the checked form until such time as profiling proves that it's a bottleneck, then use the unchecked form once it's proven safe to do so.

If bytes deeper in the data need to be parsed, a slice of the raw bytes can be obtained before conversion:

use std::str;

fn main() {
    let raw_data = b"123132";

    let the_bytes = &raw_data[1..4];
    let the_string = str::from_utf8(the_bytes).expect("not UTF-8");
    let the_number: u64 = the_string.parse().expect("not a number");

    assert_eq!(the_number, 231);
}

As in other code, these these lines can be extracted into a function or a trait to allow for reuse. However, once that path is followed, it would be a good idea to look into one of the many great crates aimed at parsing. This is especially true if there's a need to parse binary data in addition to textual data.

like image 99
Shepmaster Avatar answered Oct 26 '22 23:10

Shepmaster


I do not know of any way in the standard library, but maybe the atoi crate works for you? Full disclosure: I am its author.

use atoi::atoi;

let (number, digits) = atoi::<u32>(b"42 is the answer"); //returns (42,2)

You can check if the second element of the tuple is a zero to see if the slice starts with a digit.

let (number, digits) = atoi::<u32>(b"x"); //returns (0,0)
let (number, digits) = atoi::<u32>(b"0"); //returns (0,1)
like image 43
Markus Klein Avatar answered Oct 27 '22 01:10

Markus Klein