Using Markov models to convert all-caps to mixed-case and related problems

Question

I've been thinking about using Markov techniques to restore missing information to natural language text.

Restore all-caps text to mixed-case.
Restore accents / diacritics to languages which should have them but have been converted to plain ASCII.
Convert rough phonetic transcriptions back into native alphabets.

That seems to be in order of least difficult to most difficult. Basically the problem is resolving ambiguities based on context.

I can use Wiktionary as a dictionary and Wikipedia as a corpus using n-grams and Hidden Markov Models to resolve the ambiguities.

Am I on the right track? Are there already some services, libraries, or tools for this sort of thing?

Examples

GEORGE LOST HIS SIM CARD IN THE BUSH ⇨ George lost his SIM card in the bush
tantot il rit a gorge deployee ⇨ tantôt il rit à gorge déployée

Fred Foo · Accepted Answer

I think you can use Markov models (HMMs) for all three tasks, but also take a look at more modern models such as conditional random fields (CRFs). Also, here's some boost for your google-fu:

Restore mixed case to text in all caps

This is called truecasing.

Restore accents / diacritics to languages which should have them but have been converted to plain ASCII

I suspect Markov models are going to have a hard time on this. OTOH, labelled training data is free since you can just take a bunch of accented text in the target language and strip the accents. See also next answer.

Convert rough phonetic transcriptions back into native alphabets

This seems strongly related to machine transliteration, which has been tried using pair HMMs (from bioinformatics/genome work).

Using Markov models to convert all-caps to mixed-case and related problems

Tags:

unicode

nlp

n-gram

markov-models

ambiguity

hippietrail

1 Answers

Fred Foo

Recent Activity

Donate For Us

Using Markov models to convert all-caps to mixed-case and related problems

Tags:

unicode

nlp

n-gram

markov-models

ambiguity

hippietrail

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us