How to convert letters with accents, umlauts, etc to their ASCII counterparts in Perl?

Question

I'm writing a program that works with documents in Perl and a lot of the documents have characters such as ä, ö, ü, é, etc (both capital and lowercase). I'd like to replace them with ASCII counterparts a, o, u, e, etc. How would I do it in Perl?

One of the solutions I thought of is to have a hash with keys being the umlaut and accent characters, and the values being ASCII counterparts, but that requires me to have a list of all umlaut and accent characters, which I don't have, and if I built a list, I'd certainly miss many as I'm unfamiliar with all the possible characters that could have umlauts, accents and other diacritics.

raina77ow · Accepted Answer

As usual, if you think of a problem which most certainly is not yours only, there's already a solution on CPAN. ) In this case it's called Text::Unidecode

use warnings;
use strict;
use utf8;
use Text::Unidecode;
print unidecode('ä, ö, ü, é'); # will print 'a, o, u, e'

mob · Answer

Text::Unidecode

See the many disclaimers, but it's probably just what you need if you just have Latin text with diacritics.

How to convert letters with accents, umlauts, etc to their ASCII counterparts in Perl?

Tags:

data-conversion

ascii

perl

diacritics

bodacydo

2 Answers

raina77ow

mob

Recent Activity

Donate For Us

How to convert letters with accents, umlauts, etc to their ASCII counterparts in Perl?

Tags:

data-conversion

ascii

perl

diacritics

bodacydo

2 Answers

raina77ow

mob

Related questions

Recent Activity

Donate For Us