In Arabic, a letter like "ا" (Alef) has many forms/variations:
(ا, أ, إ, آ)
also it's the same case with the letter ي, it could also be ى.
What I am trying to do is to get ALL the possible variations of a word with many أ and ي letters.
For example the word "أين" should have all these possible (non-correct in most cases) variations: أين, إين, اين, آين, أىن, إين, اىن, آىن ... etc.
Why? I am building a small text correction system that can handle syntax mistakes and replace faulty words with the correct ones.
I have been trying to do this in the most clean way possible, but I ended up with a 8 for/foreach loops just to handle the word "أ"
There must be a better more clean way to do this! Any thoughts?
Here is my code up to this point:
$alefVariations = ['ا', 'إ', 'أ', 'آ'];
$word = 'أيامنا';
// Break into letters
$wordLetters = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$wordAlefLettersIndexes = [];
// Get the أ letters
for($letterIndex = 0; $letterIndex < count($wordLetters); $letterIndex++){
if(in_array($wordLetters[$letterIndex], $alefVariations)){
$wordAlefLettersIndexes[] = $letterIndex;
}
}
$eachLetterVariations = [];
foreach($wordAlefLettersIndexes as $alefLettersIndex){
foreach($alefVariations as $alefVariation){
$wordCopy = $wordLetters;
$wordCopy[$alefLettersIndex] = $alefVariation;
$eachLetterVariations[$alefLettersIndex][] = $wordCopy;
}
}
$variations = [];
foreach($wordAlefLettersIndexes as $alefLettersIndex){
$alefWordVariations = $eachLetterVariations[$alefLettersIndex];
foreach($wordAlefLettersIndexes as $alefLettersIndex_inner){
if($alefLettersIndex == $alefLettersIndex_inner) continue;
foreach($alefWordVariations as $alefWordVariation){
foreach($alefVariations as $alefVariation){
$alefWordVariationCopy = $alefWordVariation;
$alefWordVariationCopy[$alefLettersIndex_inner] = $alefVariation;
$variations[] = $alefWordVariationCopy;
}
}
}
}
$finalList = [];
foreach($variations as $variation){
$finalList[] = implode('', $variation);
}
return array_unique($finalList);
Anagram. A word or phrase formed by rearranging the letters of another word or phrase.
An acrostic is a poem or other word composition in which the first letter (or syllable, or word) of each new line (or paragraph, or other recurring feature in the text) spells out a word, message or the alphabet.
Alternate titles: blend. By The Editors of Encyclopaedia Britannica Edit History. Table of Contents. portmanteau word, also called blend, a word that results from blending two or more words, or parts of words, such that the portmanteau word expresses some combination of the meaning of its parts.
I don't think this is the way to do autocorrect, but here's a generic solution for the problem you asked. It uses recursion and it's in javascript (I don't know php).
function solve(word, sameLetters, customIndices = []){
var splitLetters = word.split('')
.map((char, index) => { // check if the current letter is within any variation
if(customIndices.length == 0 || customIndices.includes(index)){
var variations = sameLetters.find(arr => arr.includes(char));
if(variations != undefined) return variations;
}
return [char];
});
// up to this point splitLetters will be like this
// [["ا","إ","أ","آ"],["ي","ى","ي"],["ا"],["م"],["ن"],["ا"]]
var res = [];
recurse(splitLetters, 0, '', res); // this function will generate all the permuations
return res;
}
function recurse(letters, index, cur, res){
if(index == letters.length){
res.push(cur);
} else {
for(var letter of letters[index]) {
recurse(letters, index + 1, cur + letter, res );
}
}
}
var sameLetters = [ // represents the variations that you want to enumerate
['ا', 'إ', 'أ', 'آ'],
['ي', 'ى', 'ي']
];
var word = 'أيامنا';
var customIndices = [0, 1]; // will make variations to the letters in these indices only. leave it empty for all indices
var ans = solve(word, sameLetters, customIndices);
console.log(ans);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With