Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically add wikilinks in a MediaWiki page, given a list of page titles

Right now, I'm trying to create a script that automatically creates links to other pages in a wiki document.

function createLinks(startingSymbol, endingSymbol, text, links){
    //this needs to be implemented somehow - replace every match of the list of links with a link
}
createLinks("[[", "]]", "This is the text to wikify", ["wikify", "text"]);
//this function would return "This is the [[text]] to [[wikify]]" as its output.

The most obvious solution would be to simply replace every match of the string text with [[text]], but then I would run into some problems - for example, if I tried to wikify the string "some problems" and "problems" within the string "some problems", I would end up with the string "[[some [[problems]]]]". Is there any way to work around this issue?

like image 672
Anderson Green Avatar asked Nov 04 '22 07:11

Anderson Green


2 Answers

Here's another approach, based on dynamically building a regexp:

function wikifyText (startString, endString, text, list) {
    list = list.map( function (str) {
        return str.replace( /([^a-z0-9_])/g, '\\$1' );
    });
    list.sort();
    list.reverse();
    var re = new RegExp( '\\b(' + list.join('|') + ')\\b', 'g' );
    return text.replace( re, startString + '$1' + endString );
}

(JSFiddle)

The \b anchors at both ends of the regexp prevent this version from trying to wikify any partial words, but you could relax this restriction if your wanted. For example, replacing regexp construction with:

    var re = new RegExp( '\\b(' + list.join('|') + ')(?=(e?s)?\\b)', 'g' );

would allow an s or es suffix at the end of the last wikified word (JSFiddle). Note that MediaWiki automatically includes such suffixes as part of the link text when the page is displayed.


Edit: Here's a version that also allows the first letter of each phrase to be case-insensitive, like MediaWiki page titles are. It also replaces the \b anchors with a slightly more Unicode-friendly solution:

function wikifyText (startString, endString, text, list) {
    list = list.map( function (str) {
        var first = str.charAt(0);
        str = first.toUpperCase() + first.toLowerCase() + str.substr(1);
        str = str.replace( /(\W)/ig, '\\$1' );
        return str.replace( /^(\\?.\\?.)/, '[$1]' );
    });
    list.sort();
    list.reverse();
    var re = new RegExp( '(^|\\W)(' + list.join('|') + ')(?=(e?s)?\\W)', 'g' );
    return text.replace( re, '$1' + startString + '$2' + endString );
}

(JSFiddle)

This would be a lot less messy if JavaScript regexps supported such standard PCRE features as case-insensitive sections, look-behind or Unicode character classes.

In particular, due to the last of these missing features, even this solution is still not completely Unicode-aware: in particular, it allows links to begin after or end before any character that matches \W, which includes punctuation but also all non-ASCII characters, even letters. (However, non-ASCII letters inside links are handled correctly.) In practice, I don't think this should be a major issue.

like image 105
Ilmari Karonen Avatar answered Nov 11 '22 16:11

Ilmari Karonen


I've created a working demo of a script that does almost exactly what I need it to do.

http://jsfiddle.net/8JcZC/2/

alert(wikifyText("[[", "]]", "There are cars, be careful, carefully, and with great care!!", ["text", "hoogahjush", "wikify", "car", "careful", "carefully", "great care"]));

function wikifyText(startString, endString, text, list){
    //sort list into ascending order
    list.sort(function(a, b){
        return a.length - b.length; // ASC -> a - b; DESC -> b - a
    });
    //replace every element in the array with the wikified text
    for(var i = 0; i < list.length; i++){
        text = text.replace(list[i], startString + list[i] + endString);
    }
    return text;
}

A word of caution: In some cases, this script may wikify words that are part of other words. For example, if the word "careful" is not in the list, and the word car is in the list, then the word "car" will be wikified inside the word "careful", like this: "[[car]]eful". I hope that I will be able to work around this limitation.

like image 21
Anderson Green Avatar answered Nov 11 '22 15:11

Anderson Green