Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove accents from accented characters [duplicate]

Tags:

perl

I am looking for advice what library and/or function should I use to convert international text to it's English characters alternative.

For example

Vous avez aimé l'épée offerte par les elfes à Frodon 

convert into

Vous avez aime l'epee offerte par les elfes a Frodon 
like image 360
Ωmega Avatar asked Jul 10 '13 03:07

Ωmega


People also ask

How do you remove accents from strings?

We can remove accents from the string by using a Python module called Unidecode. This module consists of a method that takes a Unicode object or string and returns a string without ascents.

How do you remove accents in Java?

Use java. text. Normalizer to handle this for you. This will separate all of the accent marks from the characters.


1 Answers

First you can decompose the characters using Unicode::Normalize, then you can use a simple regex to delete all the diacriticals. (I think simply grabbing all the non-spacing mark characters should do it, but there might be an obscure exception or two.)

Here's an example:

use strict;
use warnings;
use utf8;

use Unicode::Normalize;

my $test = "Vous avez aimé l'épée offerte par les elfes à Frodon";

my $decomposed = NFKD( $test );
$decomposed =~ s/\p{NonspacingMark}//g;

print $decomposed;

Output:

Vous avez aime l'epee offerte par les elfes a Frodon
like image 87
friedo Avatar answered Oct 01 '22 22:10

friedo