Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split text into pages and present separately (HTML5)

Tags:

Let's say we have a long text like Romeo & Juliet and we want to present this in a simple ereader (no animations, only pages and custom font-size). What approaches exist to get this?

What I have come up with so far:

  • Using css3 columns it would be possible to load the entire text into memory styling it in such a way that a single column takes the size of an entire page. Doing this turned out to be extremely hard to control and requires the entire text to be loaded into memory.
  • Using css3 regions (not supported in any major browser) would constitute the same basic concept as the previous solution, with the major difference that it wouldn't be as hard to control (as every 'column' is a self contained element).
  • Drawing the text on a canvas would allow you to know exactly where the text ends and thus draw the next page based on that. One of the advantages is that you only need to load all the text up to the current page (still bad, but better). The disadvantage is that the text can't be interacted with (Like selecting the text).
  • Place every single word inside an element and give every element a unique id (or keep a logical reference in javascript), next use document.elementFromPoint to find the element(word) which is the last on the page and show the next page onward from that word. Despite this being the only one which seems actually realistic to me, the overhead generated by this has to be immense.

Yet none of those seems to be acceptable (first didn't give enough control to even get it to work, second isn't supported yet, third is hard and without text selection and fourth gives a ridiculous overhead), so any good approaches I haven't thought of yet, or ways to solve one or more disadvantages of the mentioned methods (yes, I am aware this is a fairly open question, but the more open it is, the higher the chance of producing any relevant answers)?

like image 270
David Mulder Avatar asked Aug 30 '12 17:08

David Mulder


People also ask

How do you split text in HTML?

The splitText() method breaks the Text node into two nodes at the specified offset index, keeping both nodes in the tree as siblings. After Splitting the text, the main node contains all the content up to the specified offset index point, and a newly created node of the same type contains the remaining text.

How do you split a page into parts in HTML?

The div tag is known as Division tag. The div tag is used in HTML to make divisions of content in the web page like (text, images, header, footer, navigation bar, etc). Div tag has both open(<div>) and closing (</div>) tag and it is mandatory to close the tag.

How do I split a header into two parts in HTML?

Just move #header-middle last in your HTML. Then what is float: right will go right, float: left will go left, and the middle content will fill upwards and occupy the unclaimed middle space. What is happening the way you have it, is the unfloated element is pushing the floated element after it.


2 Answers

SVG might be a good fit for your text pagination

  • SVG text is actually text -- unlike canvas which displays just a picture of text.

  • SVG text is readable, selectable, searchable.

  • SVG text does not auto-wrap natively, but this is easily remedied using javascript.

  • Flexible page sizes are possible because page formatting is done in javascript.

  • Pagination does not rely on browser dependent formatting.

  • Text downloads are small and efficient. Only the text for the current page needs to be downloaded.

Here are the details of how SVG pagination can be done and a Demo:

http://jsfiddle.net/m1erickson/Lf4Vt/

enter image description here

Part 1: Efficiently fetch about a page worth of words from a database on the server

Store the entire text in a database with 1 word per row.

Each row (word) is sequentially indexed by the word's order (word#1 has index==1, word#2 has index==2, etc).

For example this would fetch the entire text in proper word order:

// select the entire text of Romeo and Juliet // “order by wordIndex” causes the words to be in proper order  Select word from RomeoAndJuliet order by wordIndex 

If you assume any page has contains about 250 words when formatted, then this database query will fetch the first 250 words of text for page#1

// select the first 250 words for page#1  Select top 250 word from RomeoAndJuliet order by wordIndex 

Now the good part!

Let’s say page#1 used 212 words after formatting. Then when you’re ready to process page#2 you can fetch 250 more words starting at word#213. This results in quick and efficient data fetches.

// select 250 more words for page#2 // “where wordIndex>212” causes the fetched words // to begin with the 213th word in the text  Select top 250 word from RomeoAndJuliet order by wordIndex where wordIndex>212 

Part 2: Format the fetched words into lines of text that fit into the specified page width

Each line of text must contain enough words to fill the specified page with, but not more.

Start line#1 with a single word and then add words 1-at-a-time until the text fits in the specified page width.

After the first line is fitted, we move down by a line-height and begin line#2.

Fitting the words on the line requires measuring each additional word added on a line. When the next word would exceed the line width, that extra word is moved to the next line.

A word can be measured using Html Canvases context.measureText method.

This code will take a set of words (like the 250 words fetched from the database) and will format as many words as possible to fill the page size.

maxWidth is the maximum pixel width of a line of text.

maxLines is the maximum number of lines that will fit on a page.

function textToLines(words,maxWidth,maxLines,x,y){      var lines=[];      while(words.length>0 && lines.length<=maxLines){         var line=getOneLineOfText(words,maxWidth);         words=words.splice(line.index+1);         lines.push(line);         wordCount+=line.index+1;     }      return(lines); }  function getOneLineOfText(words,maxWidth){     var line="";     var space="";     for(var i=0;i<words.length;i++){         var testWidth=ctx.measureText(line+" "+words[i]).width;         if(testWidth>maxWidth){return({index:i-1,text:line});}         line+=space+words[i];         space=" ";     }     return({index:words.length-1,text:line}); } 

Part 3: Display the lines of text using SVG

The SVG Text element is a true html element that can be read, selected and searched.

Each individual line of text in the SVG Text element is displayed using an SVG Tspan element.

This code takes the lines of text which were formatted in Part#2 and displays the lines as a page of text using SVG.

function drawSvg(lines,x){     var svg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');     var sText = document.createElementNS('http://www.w3.org/2000/svg', 'text');     sText.setAttributeNS(null, 'font-family', 'verdana');     sText.setAttributeNS(null, 'font-size', "14px");     sText.setAttributeNS(null, 'fill', '#000000');     for(var i=0;i<lines.length;i++){         var sTSpan = document.createElementNS('http://www.w3.org/2000/svg', 'tspan');         sTSpan.setAttributeNS(null, 'x', x);         sTSpan.setAttributeNS(null, 'dy', lineHeight+"px");         sTSpan.appendChild(document.createTextNode(lines[i].text));         sText.appendChild(sTSpan);     }     svg.appendChild(sText);     $page.append(svg); } 

Here is complete code just in case the Demo link breaks:

<!doctype html> <html> <head> <link rel="stylesheet" type="text/css" media="all" href="css/reset.css" /> <!-- reset css --> <script type="text/javascript" src="http://code.jquery.com/jquery.min.js"></script> <style>     body{ background-color: ivory; }     .page{border:1px solid red;} </style> <script> $(function(){      var canvas=document.createElement("canvas");     var ctx=canvas.getContext("2d");     ctx.font="14px verdana";      var pageWidth=250;     var pageHeight=150;     var pagePaddingLeft=10;     var pagePaddingRight=10;     var approxWordsPerPage=500;             var lineHeight=18;     var maxLinesPerPage=parseInt(pageHeight/lineHeight)-1;     var x=pagePaddingLeft;     var y=lineHeight;     var maxWidth=pageWidth-pagePaddingLeft-pagePaddingRight;     var text="Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";      // # words that have been displayed      //(used when ordering a new page of words)     var wordCount=0;      // size the div to the desired page size     $pages=$(".page");     $pages.width(pageWidth)     $pages.height(pageHeight);       // Test: Page#1      // get a reference to the page div     var $page=$("#page");     // use html canvas to word-wrap this page     var lines=textToLines(getNextWords(wordCount),maxWidth,maxLinesPerPage,x,y);     // create svg elements for each line of text on the page     drawSvg(lines,x);      // Test: Page#2 (just testing...normally there's only 1 full-screen page)     var $page=$("#page2");     var lines=textToLines(getNextWords(wordCount),maxWidth,maxLinesPerPage,x,y);     drawSvg(lines,x);      // Test: Page#3 (just testing...normally there's only 1 full-screen page)     var $page=$("#page3");     var lines=textToLines(getNextWords(wordCount),maxWidth,maxLinesPerPage,x,y);     drawSvg(lines,x);       // fetch the next page of words from the server database     // (since we've specified the starting point in the entire text     //  we only have to download 1 page of text as needed     function getNextWords(nextWordIndex){         // Eg: select top 500 word from romeoAndJuliet          //     where wordIndex>=nextwordIndex         //     order by wordIndex         //         // But here for testing, we just hardcode the entire text          var testingText="Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";         var testingWords=testingText.split(" ");         var words=testingWords.splice(nextWordIndex,approxWordsPerPage);          //          return(words);         }       function textToLines(words,maxWidth,maxLines,x,y){          var lines=[];          while(words.length>0 && lines.length<=maxLines){             var line=getLineOfText(words,maxWidth);             words=words.splice(line.index+1);             lines.push(line);             wordCount+=line.index+1;         }          return(lines);     }      function getLineOfText(words,maxWidth){         var line="";         var space="";         for(var i=0;i<words.length;i++){             var testWidth=ctx.measureText(line+" "+words[i]).width;             if(testWidth>maxWidth){return({index:i-1,text:line});}             line+=space+words[i];             space=" ";         }         return({index:words.length-1,text:line});     }      function drawSvg(lines,x){         var svg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');         var sText = document.createElementNS('http://www.w3.org/2000/svg', 'text');         sText.setAttributeNS(null, 'font-family', 'verdana');         sText.setAttributeNS(null, 'font-size', "14px");         sText.setAttributeNS(null, 'fill', '#000000');         for(var i=0;i<lines.length;i++){             var sTSpan = document.createElementNS('http://www.w3.org/2000/svg', 'tspan');             sTSpan.setAttributeNS(null, 'x', x);             sTSpan.setAttributeNS(null, 'dy', lineHeight+"px");             sTSpan.appendChild(document.createTextNode(lines[i].text));             sText.appendChild(sTSpan);         }         svg.appendChild(sText);         $page.append(svg);     }  }); // end $(function(){}); </script> </head> <body>     <h4>Text split into "pages"<br>(Selectable & Searchable)</h4>     <div id="page" class="page"></div>     <h4>Page 2</h4>     <div id="page2" class="page"></div>     <h4>Page 3</h4>     <div id="page3" class="page"></div> </body> </html> 
like image 189
markE Avatar answered Sep 30 '22 05:09

markE


See my answer to Wrap text every 2500 characters in a for pagination using PHP or javascript. I ended up with http://jsfiddle.net/Eric/WTPzn/show

Quoting the original post:

Just set your HTML to:

<div id="target">...</div> 

Add some css for pages:

#target {     white-space: pre-wrap; /* respect line breaks */ } .individualPage {     border: 1px solid black;     padding: 5px;     } 

And then use the following code:

var contentBox = $('#target'); //get the text as an array of word-like things var words = contentBox.text().split(' ');  function paginate() {     //create a div to build the pages in     var newPage = $('<div class="individualPage" />');     contentBox.empty().append(newPage);      //start off with no page text     var pageText = null;     for(var i = 0; i < words.length; i++) {         //add the next word to the pageText         var betterPageText = pageText ? pageText + ' ' + words[i]                                       : words[i];         newPage.text(betterPageText);          //Check if the page is too long         if(newPage.height() > $(window).height()) {             //revert the text             newPage.text(pageText);              //and insert a copy of the page at the start of the document             newPage.clone().insertBefore(newPage);              //start a new page             pageText = null;         } else {             //this longer text still fits             pageText = betterPageText;                      }     }     }  $(window).resize(paginate).resize(); 
like image 30
Eric Avatar answered Sep 30 '22 07:09

Eric