most reliable way of getting x pixels worth of text from string, javascript

Question

I have a string which contains a lot of text, text, in my JavaScript file. I also have an element, div#container that is styled (using separate CSS) with potentially nonstandard line-height, font-size, font-face, and maybe others. It has a fixed height and width.

I'd like to get the maximum amount of text that can fit into div#container without any overflow from the string. What's the best way of doing this?

This needs to be able to work with text formatted with tags, for example:

<strong>Hello person that is this is long and may take more than a</strong> 
line and so on.

Currently, I've got a JQuery plugin that works for plain text, code follows:

// returns the part of the string that cannot fit into the object
$.fn.func = function(str) {
    var height = this.height();

    this.height("auto");
    while(true) {
        if(str == "") {
            this.height(height);
            return str; // the string is empty, we're done
        }

        var r = sfw(str); // r = [word, rest of String] (sfw is a split first word function defined elsewhere
        var w = r[0], s = r[1];

        var old_html = this.html();
        this.html(old_html + " " + w);

        if(this.height() > height)
        {
            this.html(old_html);
            this.height(height);
            return str; // overflow, return to last working version
        }

        str = s;

    }
}

UPDATE:

The data looks like this:

<ol>
  <li>
     <h2>Title</h2>
     <ol>
        <li>Character</li>
        <ol>
          <li>Line one that might go on a long time, SHOULD NOT BE BROKEN</li>
          <li>Line two can be separated from line one, but not from itself</li>
        </ol>
      </ol>
     <ol>
        <li>This can be split from other</li>
        <ol>
          <li>Line one that might go on a long time, SHOULD NOT BE BROKEN</li>
          <li>Line two can be separated from line one, but not from itself</li>
        </ol>
      </ol>
   </li>  <li>
     <h2>Title</h2>
     <ol>
        <li>Character</li>
        <ol>
          <li>Line one that might go on a long time, SHOULD NOT BE BROKEN</li>
          <li>Line two can be separated from line one, but not from itself</li>
        </ol>
      </ol>
     <ol>
        <li>This can be split from other</li>
        <ol>
          <li>Line one that might go on a long time, SHOULD NOT BE BROKEN</li>
          <li>Line two can be separated from line one, but not from itself</li>
        </ol>
      </ol>
   </li>
</ol>

Maxym · Accepted Answer

well, let me try to solve it ;) actually thinking about solution I noticed that I don't know enough about requirements you have, so I decided to develop simple JavaScript code and show you result; after trying it you can tell me what's wrong so I can fix/change it, deal?

I used pure JavaScript, no jQuery (it can be rewritten if needed). The principle is similar to your jQuery plugin:

we take characters one by one (instead of words as sfw function does; it can be changed)
if it is part of opening tag, browser does not show it, so I didn't processed it special way, just appended one by one characters from tag name and checked height of container... no idea if it is that bad. I mean when I write container.innerHTML = "My String has a link <a href='#'"; in browser I see "My String has a link", so "unfinished" tag does not influence size of container (at least in all browsers where I tested)
check size of container, and if it is bigger than we expect it to be, then previous string (actually current string without last character) is what we are looking for
now we have to close all opening tags, which are not closed because of cutting

HTML page to test it:

<html>

  <head>
    <style>
    div {
      font-family: Arial;
      font-size: 20px;
      width: 200px;
      height: 25px;
      overflow: hidden;
    }
    </style>
  </head>

  <body>
     <div id="container"> <strong><i>Strong text with <a href="#">link</a> </i> and </strong> simple text </div>

     <script>
     /**
      * this function crops text inside div element, leaving DOMstructure valid (as much as possible ;).
      * also it makes visible part as "big" as possible, meaning that last visible word will be split 
      * to show its first letters if possible
      *
      * @param container {HTMLDivElement} - container which can also have html elements inside
      * @return {String} - visible part of html inside div element given
      */
     function cropInnerText( container ) {
       var fullText = container.innerHTML; // initial html text inside container 
       var realHeight = container.clientHeight; // remember initial height of the container 
       container.style.height = "auto"; // change height to "auto", now div "fits" its content 

       var i = 0;
       var croppedText = "";
       while(true) {
         // if initial container content is the same that cropped one then there is nothing left to do
         if(croppedText == fullText) { 
           container.style.height = realHeight + "px";
           return croppedText;
         }

         // actually append fullText characters one by one...    
         var nextChar = fullText.charAt( i );
         container.innerHTML = croppedText + nextChar;  

         // ... and check current height, if we still fit size needed
         // if we don't, then we found that visible part of string
         if ( container.clientHeight > realHeight ) {
           // take all opening tags in cropped text 
           var openingTags = croppedText.match( /<[^<>\/]+>/g );
           if ( openingTags != null ) {
             // take all closing tags in cropped text 
             var closingTags = croppedText.match( /<\/[^<>]+>/g ) || [];
             // for each opening tags, which are not closed, in right order...
             for ( var j = openingTags.length - closingTags.length - 1; j > -1; j-- ) {
               var openingTag; 
               if ( openingTags[j].indexOf(' ') > -1 ) {
                 // if there are attributes, then we take only tag name
                 openingTag = openingTags[j].substr(1, openingTags[j].indexOf(' ')-1 ) + '>';
               }
               else {
                 openingTag = openingTags[j].substr(1);
               }
               // ... close opening tag to have valid html
               croppedText += '</' + openingTag;
             }
           }

           // return height of container back ... 
           container.style.height = realHeight + "px";
           // ... as well as its visible content 
           container.innerHTML = croppedText;
           return croppedText;
         }

         i++;
         croppedText += nextChar;
       }

     }

     var container = document.getElementById("container");
     var str = cropInnerText( container );
     console.info( str ); // in this case it prints '<strong><i>Strong text with <a href="#">link</a></i></strong>'
   </script>

</body>

Possible improvements / changes:

I do not create any new DOM elements, so I just reuse current container (to be sure I take into account all css styles); this way I change its content all the time, but after taking visible text you can write fullText back into container if needed (which I also do not change)
Processing original text word by word will let us make less changes in DOM (we will write word by word instead of character by character), so this way should be faster. You already have sfw function, so you can change it easily.
If we have two words "our sentence", it is possible that visible will be only first one ("our"), and "sentence" should be cut (overflow:hidden will work this way). In my case, I will append character by character, so my result can be "our sent". Again, this is not a complex part of algorithm, so based on your jQuery plugin code, you can change mine to work with words.

Questions, remarks, bugs found are welcome ;) I tested it in IE9, FF3.6, Chrome 9

UPDATE: Accroding to an issue with <li>, <h1> ... E.g. I have container with content:

<div id="container"> <strong><i>Strong text with <ul><li>link</li></ul> </i> and </strong> simple text </div>

In this case browser behaves this way (string by string what is in container and what I see it shows according to the algorithm):

...
"<strong><i>Strong text with <" -> "<strong><i>Strong text with <"
"<strong><i>Strong text with <u" -> "<strong><i>Strong text with "
"<strong><i>Strong text with <ul" -> "<strong><i>Strong text with <ul></ul>" // well I mean it recognizes ul tag and changes size of container

and result of algorithm is string "Strong text with <u" - with "<u", what is not nice. What I need to process in this case is that if we found our result string ("Strong text with <u" according to the algorithm), we need to removed last "unclosed" tag ("<u" in our case), so before closing tags to have valid html I added the following:

...
if ( container.clientHeight > realHeight ) {
  /* start of changes */
  var unclosedTags = croppedText.match(/<[\w]*/g);
  var lastUnclosedTag = unclosedTags[ unclosedTags.length - 1 ];
  if ( croppedText.lastIndexOf( lastUnclosedTag ) + lastUnclosedTag.length == croppedText.length ) {
    croppedText = croppedText.substr(0, croppedText.length - lastUnclosedTag.length );
  }
  /* end of changes */
  // take all opening tags in cropped text 
...

probably a bit lazy implementation, but it can be tuned if it slows down. What is done here

take all tags without > (in our case it returns ["<strong", "<i", "<u"]);
take last one ("<u")
if it is end of croppedText string, then we remove it

after doing it, the result string becomes "Strong text with "

UPDATE2 thank you for example, so I see that you don't have just nested tags, but they also have "tree" structure, indeed I didn't take it into account, but it still can be fixed ;) At the beginning I wanted to write my appropriate "parser", but all the time I get an example when I does not work, so I thought it is better to find already written parser, and there is one: Pure JavaScript HTML Parser. There is also one shag to it:

While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff.

but for your example it works; that library didn't take into account position of opening tag, but

we rely that original html structure is fine (not broken);
we close tags at the end of the result "string" (so this is ok)

I think that with that assumptions this library is nice to use. Then result function looks like:

<script src="http://ejohn.org/files/htmlparser.js"></script>
 <script>
 function cropInnerText( container ) {
   var fullText = container.innerHTML;
   var realHeight = container.clientHeight;
   container.style.height = "auto";

   var i = 0;
   var croppedText = "";
   while(true) {
     if(croppedText == fullText) { 
       container.style.height = realHeight + "px";
       return croppedText;
     }

     var nextChar = fullText.charAt( i );
     container.innerHTML = croppedText + nextChar;  

     if ( container.clientHeight > realHeight ) {
       // we still have to remove unended tag (like "<u" - with no closed bracket)
       var unclosedTags = croppedText.match(/<[\w]*/g);
       var lastUnclosedTag = unclosedTags[ unclosedTags.length - 1 ];
       if ( croppedText.lastIndexOf( lastUnclosedTag ) + lastUnclosedTag.length == croppedText.length ) {
         croppedText = croppedText.substr(0, croppedText.length - lastUnclosedTag.length );
       }

       // this part is now quite simple ;)
       croppedText = HTMLtoXML(croppedText);

       container.style.height = realHeight + "px";
       container.innerHTML = croppedText ;
       return croppedText;
     }

     i++;
     croppedText += nextChar;
   }

 }
 </script>

Robert Koritnik · Answer

To get longest possible first line:

Create a DIV with visibility:hidden; (so it will have dimension) but position it as position:absolute; so it won't break your page flow
set its type style to the same values as your resulting DIV
Set it's height the same as resulting DIV but keep width:auto;
Add text to it
Keep cutting off text until width drops below resulting DIV's width.

The result is the text you can put in.

Adjust the algorithm if you need to find amount of lines that fit into container to keep height:auto; and set fixed width.

The same technique is used by auto-adjusting textareas that auto-grow while users type in text.

David Bullock · Answer

To solve this, you're going to need additional information:

where should I 'chop' the input text
having chopped it, how do I repair the two halves so that I can stuff each one into a DIV?

As for the 'where to chop' question, you'll probably have to inject unique <a name="uniq"/> anchor tags at strategic points in your input string (say ... before each opening tag in the input?). Then, you can test the layed-out position of each anchor and find where to break the input.

Having found the most logical point to break, you'll need to add tags at the end of the first half to close it off, and add tags at the front of the next half to open it. So when you parsed your input string to find the opening tags previously, you kept a list of the 'tag stack' when you injected the <a/>. Lookup the tag stack that's relevant for this paritcular and then add the tags as required.

I can spot 2 gotchas with this:

you'll need to keep more information about each break if the input tags have attributes
you may need to treat some tags as 'unbreakable' and break at an earlier <a/> instead

Ultimately, it seems to me you're waiting for HTML5's column construct.

most reliable way of getting x pixels worth of text from string, javascript

Tags:

javascript

html

formatting

Aaron Yodaiken

3 Answers

Maxym

Robert Koritnik

David Bullock

Recent Activity

Donate For Us

most reliable way of getting x pixels worth of text from string, javascript

Tags:

javascript

html

formatting

Aaron Yodaiken

3 Answers

Maxym

Robert Koritnik

David Bullock

Related questions

Recent Activity

Donate For Us