Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modifying chars in a String by index

Tags:

rust

I wrote a function to titlecase (first letter capitalized, all others lowercase) a borrowed String, but it ended up being more of a hassle than it feels like it should be.

fn titlecase_word(word: &mut String) {

    unsafe {
        let buffer = word.as_mut_vec().as_mut_slice();
        buffer[0] = std::char::to_uppercase(buffer[0] as char) as u8;

        for i in range(1, buffer.len()) {
            buffer[i] = std::char::to_lowercase(buffer[i] as char) as u8;
        }
    }
}

The unsafe block is particularly undesirable. Is there a nicer way to modify String contents by index?

like image 357
user2981708 Avatar asked Oct 24 '14 08:10

user2981708


1 Answers

Update: updated for the latest Rust. As of Rust 1.0.0-alpha, to_lowercase()/to_uppercase() are now methods in CharExt trait and there is no separate Ascii type anymore: ASCII operations are now gathered in two traits, AsciiExt and OwnedAsciiExt. They are marked as unstable, so they probably can change throughout the Rust beta period.


Your code is incorrect because it access individual bytes to perform char-based operations, but in UTF-8 characters are not bytes. It won't work correctly for anything which is not ASCII.

In fact, there is no way to do this in-place correctly, because any character conversions may change the number of bytes the character occupy, and this would require full string reallocation. You should iterate over characters and collect them to a new string:

fn titlecase_word(word: &mut String) {
    if word.is_empty() { return; }

    let mut result = String::with_capacity(word.len());

    {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());

        for c in chars {
            result.push(c.to_lowercase());
        }
    }

    *word = result;
}

(try it here)

Because you need generate a new string anyway, it is better just to return it, without replacing the old one. In this case it is also better to pass a slice to the function:

fn titlecase_word(word: &str) -> String {
    let mut result = String::with_capacity(word.len());

    if !word.is_empty() {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());

        for c in chars {
            result.push(c.to_lowercase());
        }
    }

    result
}

(try it here)

Also String has extend() method from Extend trait which provides a more idiomatic approach as opposed to for loop:

fn titlecase_word(word: &str) -> String {
    let mut result = String::with_capacity(word.len());

    if !word.is_empty() {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());
        result.extend(chars.map(|c| c.to_lowercase()));
    }

    result
}

(try it here)

In fact, with iterators it is possible to shorten it even further:

fn titlecase_word(word: &str) -> String {
    word.chars().enumerate()
        .map(|(i, c)| if i == 0 { c.to_uppercase() } else { c.to_lowercase() })
        .collect()
}

(try it here)

If you know in advance that you're working with ASCII, however, you could use traits provided by std::ascii module:

fn titlecase_word(word: String) -> String {
    use std::ascii::{AsciiExt, OwnedAsciiExt};
    assert!(word.is_ascii());

    let mut result = word.into_bytes().into_ascii_lowercase();
    result[0] = result[0].to_ascii_uppercase();

    String::from_utf8(result).unwrap()
}

(try it here)

This function will fail if the input string contains any non-ASCII character.

This function won't allocate anything and will modify string contents in-place. However, you can't write such function with a single &mut String argument without unsafe and without extra allocations because it would require moving out from &mut, and this is disallowed.

You could use std::mem::swap() and a temporary variable with an empty string, though - it won't require unsafe but it may require an allocation of the empty string. I don't remember if it actually does need an allocation; if not, then you can write such a function, though the code will be somewhat cumbersome. Anyway, &mut-arguments are not really idiomatic for Rust.

like image 108
Vladimir Matveev Avatar answered Oct 21 '22 22:10

Vladimir Matveev