Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bi-directional, URL friendly Node.js slug function

I am trying to make a function in Node.js that takes the article title whether it be in Arabic, Latin Languages or a combination of them and convert it to a URL friendly string that respects text direction.

Currently, if there is no different direction mixing going on everything works perfectly. Here are some tests in different languages:

makeURLFriendly("Est-ce que vous avez des frères et sœurs? (Do you have siblings?)")
// French test, returns:
// est-ce-que-vous-avez-des-freres-et-soeurs-do-you-have-siblings

makeURLFriendly("Kannst du/ Können Sie mir helfen?")
// German test, returns:
// kannst-du-konnen-sie-mir-helfen

makeURLFriendly("A=+n_the)m, w!h@a#`t w~e k$n%o^w s&o f*a(r!")
// English with a bunch of symbols test, returns:
// anthem-what-we-know-so-far

makeURLFriendly("إليك أقوى برنامج إسترجاع ملفات في العالم بعرض حصري !")
// Arabic test, returns:
إليك-أقو-برنامج-إسترجاع-ملفات-في-العالم-بعرض-حصري

Problems start to occur when using bi-directional languages together, the problem isn't just in what the function returns but also in what is given to the function. For example, when trying to type a test title in Arabic mixed with English I get something like this:

ماكروسوفت تطور من Outlook.com

The directions are messed up, but I noticed that when pasting the same string to facebook it gets fixed:

a facebook message

How can I achieve the same result in Node.js before feeding it to the makeURLFriendly function?

like image 852
Mohamed Seif Khalid Avatar asked Oct 26 '25 09:10

Mohamed Seif Khalid


1 Answers

The solution was to add U+202B the "Right to Left Embedding" character to the beginning of the string and before any left to right word.

Here is the final function if someone wants it:

const makeURLFriendly = string => {
    let urlFriendlyString = ""

    // Initial clean up.
    string = string
        // Remove spaces from start and end.
        .trim()
        // Changes all characters to lower case.
        .toLowerCase()
        // Remove symbols with a space.
        .replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/g, " ")

    // Special characters and the characters they will be replaced by.
    const specialCharacters = "àáäâãåăæçèéëêǵḧìíïîḿńǹñòóöôœṕŕßśșțùúüûǘẃẍÿź"
    const replaceCharacters = "aaaaaaaaceeeeghiiiimnnnoooooprssstuuuuuwxyz"
    // Creates a regular expression that matches all the special characters
    // from the specialCharacters constant. Will make something like this:
    // /à|á|ä/g and matches à or á or ä...
    const specialCharactersRegularExpression = new RegExp(
        specialCharacters.split("").join("|"),
        "g"
    )
    // Replaces special characters by their url friendly equivalent.
    string = string
        .replace(
            specialCharactersRegularExpression,
            matchedCharacter => replaceCharacters.charAt(
                specialCharacters.indexOf(matchedCharacter)
            )
        )
        .replace(/œ/g, "oe")

    // Only keeps Arabic, English and numbers in the string.
    const arabicLetters = "ىشغظذخثتسرقضفعصنملكيطحزوهدجبأاإآلإلألآؤءئة"
    const englishLetters = "abcdefghijklmnopqrstuvwxyz"
    const numbers = "0123456789"
    for (let character of string) {
        if (character === " ") {
            urlFriendlyString += character
            continue
        }
        const characterIsURLFriendly = Boolean(
            arabicLetters.includes(character) ||
            englishLetters.includes(character) ||
            numbers.includes(character)
        )
        if (characterIsURLFriendly) urlFriendlyString += character
    }

    // Clean up before text direction algorithm.
    // Replace multiple spaces with one space.
    urlFriendlyString = urlFriendlyString.replace(/\s+/g, "-")

    // Regular expression that matches strings that have
    // right to left direction.
    const isRightToLeft = /[\u0590-\u05ff\u0600-\u06ff]/u
    // Makes an array of all the words in urlFriendlyString
    let words = urlFriendlyString.split("-")

    // Checks if urlFriendlyString is a unidirectional string.
    // Makes another array of boolean values that signify if
    // a string isRightToLeft. Then basically checks if all
    // the boolean values are the same. If yes then the string
    // is unidirectional.
    const stringIsUnidirectional = Boolean(
        words
        .map(word => isRightToLeft.test(word))
        .filter((isWordRightToLeft, index, words) => {
            if (isWordRightToLeft === words[0]) return true
            else return false
        })
        .length === words.length
    )

    // If the string is unidirectional, there is no need for
    // it to pass through our bidirectional algorithm.
    if (stringIsUnidirectional) {
        return urlFriendlyString
            // Replaces multiple hyphens by one hyphen
            .replace(/-+/g, "-")
            // Remove hyphen from start.
            .replace(/^-+/, "")
            // Remove hyphen from end.
            .replace(/-+$/, "")
    }

    // Reset urlFriendlyString so we can rewrite it in the
    // direction we want.
    urlFriendlyString = ""
    // Add U+202B "Right to Left Embedding" character to the
    // start of the words array.
    words.unshift("\u202B")
    // Loop throught the values on the word array.
    for (let word of words) {
        // Concatinate - before every word (the first one will
        // be cleaned later on).
        urlFriendlyString += "-"
        // If the word isn't right to left concatinate the "Right
        // to Left Embedding" character before the word.
        if (!isRightToLeft.test(word)) urlFriendlyString += `\u202B${word}`
        // If not then just concatinate the word.
        else urlFriendlyString += word
    }

    return urlFriendlyString
        // Replaces multiple hyphens by one hyphen.
        .replace(/-+/g, "-")
        // Remove hyphen from start.
        .replace(/^-+/, "")
        // Remove hyphen from end.
        .replace(/-+$/, "")
        // The character U+202B is invisible, so if it is in the start
        // or the end of a string, the first two regular expressions won't
        // match them and the string will look like it still has hyphens
        // in the start or the end.
        .replace(/^\u202B-+/, "")
        .replace(/-+\u202B$/, "")
        // Removes multiple hyphens that come after U + 202B
        .replace(/\u202B-+/, "")

}

Also, when I .split() the returned string the words are ordered well. Maybe this will be good for some SEO. The console I use doesn't show Arabic characters properly or at all. So, I made this script to write to a file to test the returned values of the script:

const fs = require("fs")

const test = () => {
    const writeStream = fs.createWriteStream("./test.txt")

    writeStream.write(makeURLFriendly("Est-ce que vous avez des frères et sœurs? (Do you have siblings?)"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("Quel est ton/votre film préféré? (What’s your favorite movie?)"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("Kannst du/ Können Sie mir helfen?"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("Ich bin (Übersetzer/Dolmetscher) / Geschäftsmann"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("你吃饭了吗"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("慢慢吃"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("# (sd sdsds   (lakem 0.5) "))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("A=+n_the)m, w!h@a#`t w~e k$n%o^w s&o f*a(r!"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("كيف تجد النيش ذات النقرات مرتفعة الثمن في أدسنس"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("إليك أقوى برنامج إسترجاع ملفات في العالم بعرض حصري !"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("عاجل ...  شركة Oppo تستعرض هاتفها الجديد  Eno"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("إنترنيت الجيل الخامس مميزاتها ! و هل صحيح ما يقوله الخبراء عن سوء إستخدامها من طرف الصين للتجسس ؟؟"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("لماذا إنخفضت أسهم شركة Apple بنسبة %20 ؟؟"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("10 نصائح لتصبح محترف في مجال Dropshipping"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`"إيلون ماسك" و "زوكربرغ"... ما سبب الخلاف يا ترى ؟`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`ماكروسوفت تطور من Outlook.com`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`ما هو  HTTPS  و هل يضمن الأمان %100 ؟؟`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`ما هي خدمة  Apple TV+ و لماذا هذا التوقيت ؟؟`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`مُراجعة هاتف سَامسونغ S10 Plus`))
}

test()
like image 137
Mohamed Seif Khalid Avatar answered Oct 27 '25 23:10

Mohamed Seif Khalid



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!