Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse HTML "style" attribute using Java

I have HTML code parsed to org.w3c.dom.Document. I need check all tag style attributes, parse them, change some CSS properties and put modified style definition back to attribute.

Is there any standard ways to parse style attribute? How can I use classes and interfaces from org.w3c.dom.css package?

I need a Java solution.

like image 820
Andrey Avatar asked Nov 23 '10 13:11

Andrey


2 Answers

If you want a way to do this without any dependencies you can use the javax.swing.text.html package classes to get you most of the way there:

import javax.swing.text.html.*;

StyleSheet styleSheet = new StyleSheet()
AttributeSet dec = ss.getDeclaration("margin:2px;padding:3px");
Object marginLeft = dec.getAttribute(CSS.Attribute.MARGIN_LEFT);
String marginLeftString = marginLeft.toString(); // "2px"

This returns a StyleSheet.CssValue, which is unfortunately not public. Thus the need to convert it to a String. Also, it won't handle em units. It is sort of smart about various styles, though. Not ideal, but avoids dependencies.

like image 200
Sam Barnum Avatar answered Nov 11 '22 22:11

Sam Barnum


First, I would check out the classes in the javax.xml packages. The javax.xml.parsers package contains parsers for two styles of parsing: SAXParser and DocumentBuilder. It sounds like you want the DocumentBuilder to create a DOM. You can either traverse the DOM manually (slow and painful), or you can use the XPath standard to look up elements in the DOM. Java support for that is in javax.xml.xpath.

XPathExpression xpath = XPath.compile("//@style");
Object results = xpath.evaluate(dom, XPathConstants.NODESET);

It's your responsibility to cast the results to the NodeList and iterate properly, but its the most direct way to get at what you want. Check out Java's DOM API for more information about reading and changing values.

I don't believe there is any support for a CSS parser built into Java, but you can look at these projects:

  • http://www.w3.org/Style/CSS/SAC/Overview.en.html
  • http://cssparser.sourceforge.net/

That may help you with your goals. NOTE: the Batik CSS parser is incorporated into the larger Apache Batik project: http://xmlgraphics.apache.org/batik/index.html which may have more than what you need, but it's a corporate friendly license.

like image 45
Berin Loritsch Avatar answered Nov 11 '22 22:11

Berin Loritsch