Extracting text from a contentEditable div

Tags:

I have a div set to contentEditable and styled with "white-space:pre" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.

We are using jQuery, whose text() function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.

I had a look at the html() function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditable div. Assuming I type this into my div:

1 2 3

These are the results:

Safari 4:

1 <div>2</div> <div>3</div>

Firefox 3.6:

1 <br _moz_dirty=""> 2 <br _moz_dirty=""> 3 <br _moz_dirty=""> <br _moz_dirty="" type="_moz">

IE 8:

<P>1</P><P>2</P><P>3</P>

Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)

The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pre tag (which was alluded to on some pages I found using Google).

Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.

Update: I cribbed the getText function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);

function extractTextWithWhitespace( elems ) {     var ret = "", elem;      for ( var i = 0; elems[i]; i++ ) {         elem = elems[i];          // Get the text from text nodes and CDATA nodes         if ( elem.nodeType === 3 || elem.nodeType === 4 ) {             ret += elem.nodeValue + "\n";          // Traverse everything else, except comment nodes         } else if ( elem.nodeType !== 8 ) {             ret += extractTextWithWhitespace2( elem.childNodes );         }     }      return ret; }

I call this function and use its output to assign it to an XML node with jQuery, something like:

var extractedText = extractTextWithWhitespace($(this)); var $someXmlNode = $('<someXmlNode/>'); $someXmlNode.text(extractedText);

The resulting XML is eventually sent to a server via an AJAX call.

This works well in Safari and Firefox.

On IE, only the first '\n' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):

return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );

Reading up on createTextNode, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?

595

asked Aug 11 '10 06:08

Shaggy Frog

1 Answers

Unfortunately you do still have to handle this for the pre case individually per-browser (I don't condone browser detection in many cases, use feature detection...but in this case it's necessary), but luckily you can take care of them all pretty concisely, like this:

var ce = $("<pre />").html($("#edit").html()); if($.browser.webkit)    ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });     if($.browser.msie)    ce.find("p").replaceWith(function() { return this.innerHTML  +  "<br>"; }); if($.browser.mozilla || $.browser.opera ||$.browser.msie )   ce.find("br").replaceWith("\n");  var textWithWhiteSpaceIntact = ce.text();

You can test it out here. IE in particular is a hassle because of the way is does   and new lines in text conversion, that's why it gets the <br> treatment above to make it consistent, so it needs 2 passes to be handled correctly.

In the above #edit is the ID of the contentEditable component, so just change that out, or make this a function, for example:

function getContentEditableText(id) {     var ce = $("<pre />").html($("#" + id).html());     if ($.browser.webkit)       ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });     if ($.browser.msie)       ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });     if ($.browser.mozilla || $.browser.opera || $.browser.msie)       ce.find("br").replaceWith("\n");      return ce.text(); }

You can test that here. Or, since this is built on jQuery methods anyway, make it a plugin, like this:

$.fn.getPreText = function () {     var ce = $("<pre />").html(this.html());     if ($.browser.webkit)       ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });     if ($.browser.msie)       ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });     if ($.browser.mozilla || $.browser.opera || $.browser.msie)       ce.find("br").replaceWith("\n");      return ce.text(); };

Then you can just call it with $("#edit").getPreText(), you can test that version here.

164

answered Oct 04 '22 03:10

Nick Craver

Related questions
                            
                                Base64 encode a javascript object
                            
                                Node.js mongodb driver async/await queries
                            
                                Shorthand for if-else statement
                            
                                jQuery Youtube URL Validation with regex
                            
                                How to flatten nested array in javascript? [duplicate]
                            
                                Why have "while(1);" in XmlHttpRequest response? [duplicate]
                            
                                Firebase phone auth Error: Invalid token. at nativeToJSError
                            
                                How to type an exported RelayContainer
                            
                                Vue.js: Nuxt error handling
                            
                                Android utilize V8 without WebView
                            
                                How do you provide default props for nested shape in React?
                            
                                How do I require() from the console using webpack?
                            
                                error handling in asynchronous node.js calls
                            
                                What does "Stateless function components cannot be given refs" mean?
                            
                                Could not find "store" in either the context or props of "Connect(App)"
                            
                                What is the defined execution order of ES6 imports?
                            
                                Can't call setState on a component that is not yet mounted
                            
                                Simple HTML sanitizer in Javascript
                            
                                JavaScript inheritance: when constructor has arguments
                            
                                How to decide when to use ngView or ngInclude?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting text from a contentEditable div

Tags:

javascript

html

jquery

css

contenteditable

Shaggy Frog

People also ask

1 Answers

Nick Craver

Recent Activity

Donate For Us