The title is a bit misleading maybe; my spellchecker focuses more on format than spelling (caps, punctuation and spaces, apostrophes, converting internet slang to full words, oft-scrambled words etc.). However the basic principles apply.
Basically, the JS/jQuery checker I'm building would correct words as they are typed (after a space or punctuation has been typed after the word).
However, much like any autocorrecting, it's bound to run into mistakes. I'm not even considering creating functionality that would determine whether "its" or "it's" is more appropriate in a given case (though if such a plugin or code snippet exists, do point me to one).
So I want to make it a "yielding" autocorrect (for the lack of the knowledge of a better name). Basically;
Now easiest of course would be to disable the check for that word entirely, but I want the checker to correct future instances of it. What I'm looking for would detect a user editing an autocorrected word (regardless whether right after typing or later) back to what it was before being autocorrected, and then learning to leave that specific instance of that word alone.
I don't even know where to begin with this. I'm thinking a contenteditable with each word wrapped in a span, autocorrected ones having a special class and a data-* attribute containing the original one, listen for edits on the autocorrected words, and if it's edited back to equaling the data-* value, add a class that leaves it out of future autocorrect rounds.
I'm thinking though that this might be unnecessarily complicated, or at least not the path of least resistance. What would be the smartest way of doing this?
Your suggested approach (separating each word in a span
and storing additional data in it) at first glance seems to be the most sensible approach. On the editor level, you just need to ensure all text is inside some span
, and that each of them contains only a single word (splitting it if necessary). On the word level, just listen for changes in the span
s (binding input
and propertyChange
) and act according to its class/data.
However, the real pain is to keep the caret position consistent. When you change the contents of either a textarea
or an element with contentEditable
, the caret moves rather unpredictably, and there's no easy (cross-browser) way of keeping track of the caret. I searched for solutions both here at SO and elsewhere, and the simplest working solution I found was this blog post. Unfortunatly it only applied to textarea
, so the "each word in a span" solution couldn't be used.
So, I suggest the following approach:
Array
, where each word stores both the current value and the original;textarea
changes, keep the set of unchanged words and redo the rest;backspace
;backspace
once will undo it, and it won't be checked again unless modified.
backspace
will undo one correction until no one is left.backspace
once to "protect" it.I created a simple proof-of-concept at jsFiddle. Details below. Note that you can combine it with other approaches (for instance, detecting a "down arrow" key and displaying a menu with some auto-correcting options) etc.
Steps of the proof-of-concept explained in detail:
Keep a list of words in an Array
, where each word stores both the current value and the original;
var words = [];
This regex splits the text into words (each word has a word
property and a sp
one; the latter stores non-word characters immediatly following it)
delimiter:/^(\w+)(\W+)(.*)$/,
...
regexSplit:function(regex,text) {
var ret = [];
for ( var match = regex.exec(text) ; match ; match = regex.exec(text) ) {
ret.push({
word:match[1],
sp:match[2],
length:match[1].length + match[2].length
});
text = match[3];
}
if ( text )
ret.push({word:text, sp:'', length:text.length});
return ret;
}
When the contents of the textarea
changes, keep the set of unchanged words and redo the rest;
// Split all the text
var split = $.autocorrect.regexSplit(options.delimiter, $this.val());
// Find unchanged words in the beginning of the field
var start = 0;
while ( start < words.length && start < split.length ) {
if ( !words[start].equals(split[start]) )
break;
start++;
}
// Find unchanged words in the end of the field
var end = 0;
while ( 0 < words.length - end && 0 < split.length - end ) {
if ( !words[words.length-end-1].equals(split[split.length-end-1]) ||
words.length-end-1 < start )
break;
end++;
}
// Autocorrects words in-between
var toSplice = [start, words.length-end - start];
for ( var i = start ; i < split.length-end ; i++ )
toSplice.push({
word:check(split[i], i),
sp:split[i].sp,
original:split[i].word,
equals:function(w) {
return this.word == w.word && this.sp == w.sp;
}
});
words.splice.apply(words, toSplice);
// Updates the text, preserving the caret position
updateText();
Only apply the spell check if the caret is just after a non-word character (room for improvement) and you're not hitting backspace
;
var caret = doGetCaretPosition(this);
var atFirstSpace = caret >= 2 &&
/\w\W/.test($this.val().substring(caret-2,caret));
function check(word, index) {
var w = (atFirstSpace && !backtracking ) ?
options.checker(word.word) :
word.word;
if ( w != word.word )
stack.push(index); // stack stores a list of auto-corrections
return w;
}
If the user was unsatisfied with the correction, hitting backspace
once will undo it, and it won't be checked again unless modified.
$(this).keydown(function(e) {
if ( e.which == 8 ) {
if ( stack.length > 0 ) {
var last = stack.pop();
words[last].word = words[last].original;
updateText(last);
return false;
}
else
backtracking = true;
stack = [];
}
});
The code for updateText
simply joins all words again into a string, and set the value back to the textarea
. The caret is preserved if nothing was changed, or placed just after the last autocorrection done/undone, to account for changes in the text length:
function updateText(undone) {
var caret = doGetCaretPosition(element);
var text = "";
for ( var i = 0 ; i < words.length ; i++ )
text += words[i].word + words[i].sp;
$this.val(text);
// If a word was autocorrected, put the caret right after it
if ( stack.length > 0 || undone !== undefined ) {
var last = undone !== undefined ? undone : stack[stack.length-1];
caret = 0;
for ( var i = 0 ; i < last ; i++ )
caret += words[i].word.length + words[i].sp.length;
caret += words[last].word.length + 1;
}
setCaretPosition(element,caret);
}
The final plugin structure:
$.fn.autocorrect = function(options) {
options = $.extend({
delimiter:/^(\w+)(\W+)(.*)$/,
checker:function(x) { return x; }
}, options);
return this.each(function() {
var element = this, $this = $(this);
var words = [];
var stack = [];
var backtracking = false;
function updateText(undone) { ... }
$this.bind("input propertyChange", function() {
stack = [];
// * Only apply the spell check if the caret...
// * When the contents of the `textarea` changes...
backtracking = false;
});
// * If the user was unsatisfied with the correction...
});
};
$.autocorrect = {
regexSplit:function(regex,text) { ... }
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With