How can I convert non-ASCII characters encoded in UTF8 to ASCII-equivalent in Perl?

Tags:

I have a Perl script that is being called by third parties to send me names of people who have registered my software. One of these parties encodes the names in UTF-8, so I have adapted my script accordingly to decode UTF-8 to ASCII with Encode::decode_utf8(...).

This usually works fine, but every 6 months or so one of the names contains cyrillic, greek or romanian characters, so decoding the name results in garbage characters such as "ÐŸÐ¾Ð´Ñ€Ð°Ð¶Ð°Ð½ÑÐºÐ°Ñ". I have to follow-up with the customer and ask him for a "latin character version" of his name in order to issue a registration code.

So, is there any Perl module that can detect whether there are such characters and automatically translates them to their closest ASCII representation if necessary?

It seems that I can use Lingua::Cyrillic::Translit::ICAO plus Lingua::DetectCharset to handle Cyrillic, but I would prefer something that works with other character sets as well.

701

asked Mar 12 '09 10:03

Adrian Grigore

1 Answers

I believe you could use Text::Unidecode for this, it is precisely what it tries to do.

164

answered Nov 08 '22 18:11

mirod

Related questions
                            
                                Perl grep nested hashes recursive
                            
                                String concatenation in Perl involing the output of functions and the like
                            
                                Why does my Time::Piece code give strange results?
                            
                                Is mod_perl dead?
                            
                                Can't use string ("1") as a HASH ref while "strict refs" in use
                            
                                How to list XML node attributes with XML::LibXML?
                            
                                How can I skip elements in Perl list assignment akin to Haskell pattern matching?
                            
                                Bracket code section in use strict / no strict?
                            
                                How to use Doxygen and Doxygen::Filter::Perl to generate documentation for Perl sub routines?
                            
                                Perl regex substitution using external parameters
                            
                                Subroutine arguments as key-value pairs without a temp variable
                            
                                Search and replace string in a very big file
                            
                                How to get trace output from only the main package of a Perl program
                            
                                Under what circumstances are END blocks skipped in Perl?
                            
                                How to resolve PintOS unrecognized character \x16
                            
                                How to use negative lookbehind with quantifiers?
                            
                                use perl's qx{} / `...` operator with a list of arguments
                            
                                Why does repeatedly opening, appending and closing a text file cause the lines to be written in reverse order?
                            
                                How can I fork a background processes from a Perl CGI script on Windows?
                            
                                Generating Synthetic DNA Sequence with Substitution Rate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I convert non-ASCII characters encoded in UTF8 to ASCII-equivalent in Perl?

Tags:

character-encoding

ascii

utf-8

perl

Adrian Grigore

People also ask

1 Answers

mirod

Recent Activity

Donate For Us