Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Emojis from String Rust

Tags:

rust

emoji

How would I remove emojis from a string like "⚑helβœ…πŸ™‚loπŸ™‚"?

I know you'd need to make use of Regex and a few other stuff but I'm not sure how to write the syntax and replace everything in the string.

Thanks, help is really appreciated .

like image 662
XtremeDevX Avatar asked Oct 29 '25 17:10

XtremeDevX


2 Answers

So I took some time to figure out, but here's the solution

/// Removes all emojis from a string **(retains chinese characters)**
///
/// # Arguments
///
/// * `string` - String with emojis
///
/// # Returns
///
/// * `String` - De-emojified string
///
/// # Examples
///
/// ```
///
/// // Remove all emojis from this string
/// let demojified_string = demoji(String::from("⚑helβœ…πŸ™‚loπŸ™‚"))
/// // Output: `hello`
/// ```
pub fn demoji(string: String) -> String {
    let regex = Regex::new(concat!(
        "[",
        "\u{01F600}-\u{01F64F}", // emoticons
        "\u{01F300}-\u{01F5FF}", // symbols & pictographs
        "\u{01F680}-\u{01F6FF}", // transport & map symbols
        "\u{01F1E0}-\u{01F1FF}", // flags (iOS)
        "\u{002702}-\u{0027B0}",
        "\u{0024C2}-\u{01F251}",
        "]+",
    ))
    .unwrap();

    regex.replace_all(&string, "").to_string()
}
like image 156
XtremeDevX Avatar answered Nov 01 '25 12:11

XtremeDevX


Ran into the same question myself now, and came up with a more reliable way to perform this than with a regex.

You can't reliably use regex or otherwise filter out individual chars for this, as there are some that don't count as emoji by themselves but may be part of an emoji.

For example, if you put the symbols πŸ‡¬ and then πŸ‡§ together one right after another, they immediately collapse into πŸ‡¬πŸ‡§ and turn into the flag of Great Britain, which is considered an emoji. But it's unlikely you consider those two individual characters themselves as emojis.

My method works by iterating over individual Unicode grapheme clusters, and then filtering out those that happen to be full emojis.

This relies on unicode_segmentation and emojis Rust crates for the two steps.

pub fn remove_emoji(string: &str) -> String {
    use unicode_segmentation::UnicodeSegmentation;
    let graphemes = string.graphemes(true);

    let is_not_emoji = |x: &&str| emojis::get(x).is_none();

    graphemes.filter(is_not_emoji).collect()
}
like image 24
Architector 4 Avatar answered Nov 01 '25 12:11

Architector 4



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!