Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx: Compare two strings to find Alliteration and Assonance

would be possible to Compare two strings to find Alliteration and Assonance?

i use mainly javascript or php

like image 512
Francesco Avatar asked Jan 26 '12 04:01

Francesco


2 Answers

I'm not sure that a regex would be the best way of building a robust comparison tool. A simple regex might be part of a larger solution that used more sophisticated algorithms for non-exact matching.

There are a variety of readily-available options for English, some of which could be extended fairly simply to languages that use the Latin alphabet. Most of these algorithms have been around for years or even decades and are well-documented, though they all have limits.

I imagine that there are similar algorithms for non-Latin alphabets but I can't comment on their availability firsthand.

Phonetic Algorithms

The Soundex algorithm is nearly 100 years old and has been implemented in multiple programming languages. It is used to determine a numeric value based on the pronunciation of a string. It is not precise but it may be useful for identifying similar sounding words/syllables. I've experimented with it in MS SQL Server and it is available in PHP.

http://php.net/manual/en/function.soundex.php

General consensus (including the PHP docs) is that Metaphone is much more accurate than Soundex when dealing with the English language. There are numerous implementations available (Wikipedia has a long list at the end of the article) and it is included in PHP.

http://www.php.net/manual/en/function.metaphone.php

Double Metahpone supports a second encoding of a word corresponding to an alternate pronunciation of the word.

As with Metaphone, Double Metaphone has been implemented in many programming languages (example).

Word Deconstruction

Levenshtein can be used to suggest alternate spellings (for example, to normalize user input) and might be useful as part of a more granular algorithm for alliteration and assonance.

http://www.php.net/manual/en/function.levenshtein.php

Logically, it would help to understand the syllabication of the words in the string so that each word could be deconstructed. The syllable break could resolve ambiguity as to how two adjacent letters should be pronounced. This thread has a few links:

PHP Syllable Detection

like image 188
Tim M. Avatar answered Nov 01 '22 23:11

Tim M.


To find alliterations in a text you simply iterate over all words, omitting too short and too common words, and collect them as long as their initial letters match.

text = ''
+'\nAs I looked to the east right into the sun,'
+'\nI saw a tower on a toft worthily built;'
+'\nA deep dale beneath a dungeon therein,'
+'\nWith deep ditches and dark and dreadful of sight'
+'\nA fair field full of folk found I in between,'
+'\nOf all manner of men the rich and the poor,'
+'\nWorking and wandering as the world asketh.'

skipWords = ['the', 'and']
curr = []

text.toLowerCase().replace(/\b\w{3,}\b/g, function(word) {
    if (skipWords.indexOf(word) >= 0)
        return;
    var len = curr.length
    if (!len || curr[len - 1].charAt(0) == word.charAt(0))
        curr.push(word)
    else {
        if (len > 2)
            console.log(curr)
        curr = [word]
    }
})

Results:

["deep", "ditches", "dark", "dreadful"]
["fair", "field", "full", "folk", "found"]
["working", "wandering", "world"]

For more advanced parsing and also to find assonances and rhymes you first have to translate a text into phonetic spelling. You didn't say which language you're targeting, for English there are some phonetic dictionaries available online, for example from Carnegie Mellon: ftp://ftp.cs.cmu.edu/project/fgdata/dict

like image 26
georg Avatar answered Nov 01 '22 22:11

georg