Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean Microsoft Word Pasted Text using JavaScript

I am using a 'contenteditable' <div/> and enabling PASTE.

It is amazing the amount of markup code that gets pasted in from a clipboard copy from Microsoft Word. I am battling this, and have gotten about 1/2 way there using Prototypes' stripTags() function (which unfortunately does not seem to enable me to keep some tags).

However, even after that, I wind up with a mind-blowing amount of unneeded markup code.

So my question is, is there some function (using JavaScript), or approach I can use that will clean up the majority of this unneeded markup?

like image 392
OneNerd Avatar asked May 20 '10 15:05

OneNerd


People also ask

How do I clear copy and paste history in Word?

Delete items from the ClipboardClick the arrow on the right side of the item, and click Delete. Right-click the item you want to delete, and click Delete. Click Clear All to clear everything out of the Clipboard.

How do I clean up text in Word?

Select the text that you want to return to its default formatting. On the Home tab, in the Font group, click Clear All Formatting. On the Home tab, in the Font group, click Clear All Formatting.

How can I clean copy text?

Use Ctrl + Alt + V (or Cmd + Alt + V on a Mac) to open the Paste Special window. Here, select Unformatted Text to paste in plain text. Finally, if you'd like, you can set the default paste option in Word to always paste in plain text. Head to File > Options and select the Advanced tab on the left.


1 Answers

Here is the function I wound up writing that does the job fairly well (as far as I can tell anyway).

I am certainly open for improvement suggestions if anyone has any. Thanks.

function cleanWordPaste( in_word_text ) {  var tmp = document.createElement("DIV");  tmp.innerHTML = in_word_text;  var newString = tmp.textContent||tmp.innerText;  // this next piece converts line breaks into break tags  // and removes the seemingly endless crap code  newString  = newString.replace(/\n\n/g, "<br />").replace(/.*<!--.*-->/g,"");  // this next piece removes any break tags (up to 10) at beginning  for ( i=0; i<10; i++ ) {   if ( newString.substr(0,6)=="<br />" ) {     newString = newString.replace("<br />", "");    }  }  return newString; } 

Hope this is helpful to some of you.

like image 102
OneNerd Avatar answered Sep 29 '22 19:09

OneNerd