Comparing a character in a Rust string using indexing

Tags:

I want to read strings from "input.txt" and leave only those which have no # (comment) symbol at the start of the line. I wrote this code:

use std::io::{BufRead, BufReader};
use std::fs::File;

fn main() {
    let file = BufReader::new(File::open("input.txt").unwrap());
    let lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
    let mut iter = lines.iter().filter(|&x| x.chars().next() != "#".chars().next());
    println!("{}", iter.next().unwrap());
}

But this line

|&x| x.chars().next() != "#".chars().next()

smells bad to me, because it can look like this |x| x[0] == "#" and I can't check the second character in the string.

So how I can refactor this code?

249

asked Oct 13 '14 18:10

Pavlo Razumovskyi

1 Answers

Rust strings are stored as a sequence of bytes representing characters in UTF-8 encoding. UTF-8 is a variable-width encoding, so byte indexing can leave you inside a character, which is obviously unsafe. But getting a code point by index is an O(n) operation. Moreover, indexing code points is not what you really want to do, because there are code points which do not even have associated characters, like diacritics or other modifiers. Indexing grapheme clusters is closer to the correct approach, but is is usually needed in text rendering or, probably, language processing.

What I mean is that indexing a string is hard to define properly, and what most people usually want is wrong. Hence Rust does not provide a generic index operation on strings.

Occasionally, however, you do need to index strings. For example, if you know in advance that your string contains only ASCII characters or if you are working with binary data. In this case Rust, of course, provides all necessary means.

First, you can always obtain a view of the underlying sequence of bytes. &str has as_bytes() method which returns &[u8], a slice of bytes the string consists of. Then you can use usual indexing operation:

x.as_bytes()[0] != b'#'

Note the special notation: b'#' means "ASCII character # of type u8", i.e. it is a byte character literal (also note that you don't need to write "#".chars().next() to get character #, you can just write '#' - a plain character literal). This is unsafe, however, because &str is UTF-8-encoded string and the first character can consist of more than one byte.

The proper way to handle ASCII data in Rust is to use the ascii crate. You can go from &str to &AsciiStr with the as_ascii_str() method. Then you can use it like this:

extern crate ascii;
use ascii::{AsAsciiStr, AsciiChar};

// ...

x.as_ascii_str().unwrap()[0] != AsciiChar::Hash

This way you will need slightly more typing but you will get much more safety in return, because as_ascii_str() checks that you work with ASCII data only.

Sometimes, however, you just want to work with binary data, without really interpreting it as characters, even if the source contains some ASCII characters. This can happen, for example, when you're writing a parser for some markup language like Markdown. In this case you can treat the whole input as a sequence of bytes:

use std::io::{Read, BufReader};
use std::fs::File;

fn main() {
    let mut file = BufReader::new(File::open("/etc/hosts").unwrap());
    let mut buf = Vec::new();
    file.read_to_end(&mut buf).unwrap();
    let mut iter = buf.split(|&c| c == b'\n').filter(|line| line[0] != b'#');
    println!("{:?}", iter.next().unwrap());
}

188

answered Sep 21 '22 19:09

Vladimir Matveev

Related questions
                            
                                Is there any simple way to add \n in r'' string in python
                            
                                How can I get a percent accuracy match when comparing two strings of an address?
                            
                                Remove adjacent duplicate characters in a String(java) i.e input:aaaabbbccdbbaae output: abcdbae
                            
                                Convert string of 0s and 1s to byte in Python
                            
                                Generate random string from regex character set
                            
                                NSIS substring by index
                            
                                Type of a C++ string literal
                            
                                %.#s format specifier in printf statement in c
                            
                                Combining two const char* together
                            
                                Printing in Python without a space
                            
                                Replace numbers in string by respective result of a substraction
                            
                                Compare objects in an if statement Powershell
                            
                                How many memory locations will it take to have a string concatenation?
                            
                                Difference between Long.parseLong(String s) and new Long(String s)?
                            
                                Partial string matching with grep and regular expressions
                            
                                build shortest string from a collection of its subsequence
                            
                                Trim a string to a specific number of characters in R
                            
                                Why escape brackets (curly braces) in a format string in .NET is '{{' or '}}" not '\{' or '\}'
                            
                                python string including double quote character
                            
                                how to alternatively concatenate 3 strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Comparing a character in a Rust string using indexing

Tags:

iterator

string

rust

Pavlo Razumovskyi

People also ask

1 Answers

Vladimir Matveev

Recent Activity

Donate For Us