Emacs lisp: Translate characters to standard ASCII transcription

Question

I am trying to write a function, that translates a string containing unicode characters into some default ASCII transcription. Ideally I'd like e.g. Ångström to become Angstroem or, if that is not possible, Angstrom. Likewise α=χ should become a=x (c?) or similar.

Does Emacs have such built-in capabilities? I know I can get the names and similar of characters (get-char-code-property) but I know no built-in transcription table.

The purpose is to translate titles of entries into meaningfully readable filenames, avoiding problems with software that doesn't understand unicode.

My current strategy is to build a translation-table by hand, but this approach is fairly limited and requires a lot of maintenance.

Mirzhan Irkegulov · Accepted Answer

There is no built-in capability that i know of. I wrote a package unidecode specifically for your task. It uses the same approach as in Python's same-named library. To install just add MELPA repository to your repository list:

(add-to-list 'package-archives
  '("melpa" . "http://melpa.milkbox.net/packages/") t)

Then run M-x package-install RET unidecode. unidecode has 2 functions, unidecode-unidecode that turns Unicode into ASCII, and unidecode-sanitize that discards non-alphanumeric characters and transforms space into hyphen.

ELISP> (unidecode-unidecode "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"!Hola!, Gruss Gott, Hyvaa paivaa, Tere ohtust, Bongu Czesc!, Dobry den, Zdravstvuite!, Geia sas, lmsllmlllmckhmslmgll"
ELISP> (unidecode-sanitize "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"hola-gruss-gott-hyvaa-paivaa-tere-ohtust-bongu-czesc-dobry-den-zdravstvuite-geia-sas-lmsllmlllmckhmslmgll"

Emacs lisp: Translate characters to standard ASCII transcription

Tags:

emacs

unicode

character

elisp

translation

kdb

1 Answers

Mirzhan Irkegulov

Recent Activity

Donate For Us

Emacs lisp: Translate characters to standard ASCII transcription

Tags:

emacs

unicode

character

elisp

translation

kdb

1 Answers

Mirzhan Irkegulov

Related questions

Recent Activity

Donate For Us