I am writing a basic word processing application and am trying to settle on a native "internal" format, the one that my code parses in order to render to the screen. I'd like this to be XML so that I can, in the future, just write XSLT to convert it to ODF or XHTML or whatever.
When searching for existing standards to use, the only one that looks promising is ODF. But that looks like massive overkill for what I need. All I need is paragraph tags, font selection, font size & decoration...that's pretty much it. It would take me a long time to implement even a minimal ODF renderer, and I'm not sure it's worth the trouble.
Right now I'm thinking of making my own XML format, but that's not really good practice. Better to use a standard, especially since then I can probably find the XSLTs I might need in the future already written.
Or should I just bite the bullet and implement ODF?
EDIT: Regarding the Answer
I knew about XSL-FO before, but due to the weight of the spec hadn't really consdiered it. But you're right, a subset would give me everything I need to work with and room to grow. Thanks so much the reminder.
Plus, by including a rendering library like FOP or RenderX, I get PDF generation for free. Not bad...
As you are sure about needing to represent the presentational side of things, it may be worth looking at the XSL-FO W3C Recommendation. This is a full-blown page description language and the (deeply unfashionable) other half of the better-known XSLT.
Clearly the whole thing is anything but "lightwight", but if you just incorporated a very limited subset - which could even just be (to match your spec of "paragraph tags, font selection, font size & decoration") fo:block and the common font properties, something like:
<yourcontainer xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:block font-family="Arial, sans-serif" font-weight="bold"
font-size="16pt">Example Heading</fo:block>
<fo:block font-family="Times, serif"
font-size="12pt">Paragraph text here etc etc...</fo:block>
</yourcontainer>
This would perhaps have a few advantages over just rolling your own. There's an open specification to work from, and all that implies. It reuses CSS properties as XML attributes (in a similar manner to SVG), so many of the formatting details will seem somewhat familiar. You'd have an upgrade path if you later decided that, say, intelligent paging was a must-have feature - including more sections of the spec as they become relevant to your application.
There's one other thing you might get from investigating XSL-FO - seeing how even just-doing-paragraphs-and-fonts can be horrendously complicated. Trying to do text layout and line breaking 'The Right Way' for various different languages and use cases seems very daunting to me.
If its only for word processing, then perhaps DocBook might be a little lighter than ODF?
However, the wiki entry states:
DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.
So it might not be so suitable for a general-purpose word-processor?
The advantage of using DocBook would be the fact that a number of DocBook -> other format converters should be available? Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With