Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize the HTML text copied from MS Word with GWT?

I'm having a problem with RichTextAreas, so my problem is: when i paste into RichTextArea the copied text from Ms Word or OpenOffice,it keeps all text styles and this is perfect, But one bad thing is it's HTML text is huge enough :( . And database's size increasing because of unnecessary HTML tags.

My question is:"How to optimize that HTML text easily?"

Thanks!!!

like image 839
Jama A. Avatar asked May 28 '11 13:05

Jama A.


1 Answers

RichTextArea is based on the browser's contentEditable support. This means that the HTML "tag soup" that you'll wind up with is going to be platform-, source-, and browser-specific. When you say "optimize" what's your end goal? How much of the original formatting do you want to preserve? Beyond just trivial minification of the HTML that's being pasted in, any significant reduction in the complexity of the HTML will likely result in a loss of visual fidelity.

Utilities such as HTML Tidy or any of its derivatives can probably help you with the minification aspect. If your goal is to reduce the complexity of the HTML, you might consider using HTMLUnit as a captive, server-side browser to render the pasted content in memory and then extract the attributes that you consider useful from HTMLUnit's DOM. FWIW, this is one way to make AJAX apps crawlable by search engines.

While reducing visual fidelity can be a little disconcerting to the original user, it does afford you the opportunity to unify the visual style of all pasted content. If you're building a site based on contributions from many users, this homogeneity decreases the amount of mental effort required to orient (i.e. see what you're seeing) the content.

like image 124
BobV Avatar answered Sep 28 '22 06:09

BobV