Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SGML parser in Java? [closed]

Tags:

java

sgml

I'm looking for a parser in Java that can parse a document formatted in SGML.

For duplicate monitors: I'm aware of the two other threads that discuss this topic: Parsing Java String with SGML Java SGML to XML conversion? But neither has a resolution, hence the new topic.

For people that confuse XML with SGML: Please read this: http://www.w3.org/TR/NOTE-sgml-xml-971215#null (in short, there are enough subtle differences to at least make it unusable in it's vanilla form)

For people who are fond of asking posters to Google it: I already did and the closest I could come up with was the widely popular SAXParser: http://download.oracle.com/javase/1.4.2/docs/api/javax/xml/parsers/SAXParser.html But that of course is meant to be an XML parser. I'm looking around to see if anyone has implemented a modification of the SAX Parser to accommodate SGML.

Lastly, I cannot use SX as I'm looking for a Java solution.

Thanks! :)

like image 977
user183037 Avatar asked Feb 01 '11 21:02

user183037


3 Answers

I have a few approaches to this problem

The first is what you did -- check to see if the sgml document is close enough to XML for the standard SAX parser to work.

The second is to do the same with HTML parsers. The trick here is to find one that doesn't ignore non-HTML elements.

I did find some Java SGML parsers, more in acedemia, when searching for "sgml parser Java". I do not know how well they work.

The last step is to take a standard (non Java) SGML parser and transform the documents into something you can read in Java.

It looks like you were able to work with the first step.

like image 162
Kathy Van Stone Avatar answered Nov 26 '22 07:11

Kathy Van Stone


I use OpenSP via JNI, as it seems there is no pure Java SGML parser. I've written an experimental SAX-like wrapper that is available at http://sourceforge.net/projects/sasgml (of course, it has all the drawbacks of JNI... but was enough for my requirements).

Another approach is converting the document to XML by using sx from Open SP, and then run a traditional SAX parser.

like image 42
Javier Avatar answered Nov 26 '22 07:11

Javier


There is no api for parsing SGML using Java at this time. There also isn't any api or library for converting SGML to XML and then parsing it using Java. With the status of SGML being supplanted by XML for all the projects I've worked on until now, I don't think there will every be any work done in this area, but that is only a guess.

Here is some open source code code from a University that does it, however I haven't tried it and you would have to search to find the other dependent classes. I believe the only viable solution in Java would require Regular Expressions.

Also, here is a link for public SGML/XML software.

like image 27
James Drinkard Avatar answered Nov 26 '22 06:11

James Drinkard