I have the next html, which I want to parse:
My input:
<div>
<span id="x1x1"> bla bla </span>
</div>
<span>
<div> bla bla </div>
</span>
My output in java:
jaxbContext = JAXBContext.newInstance(Div.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
jaxbUnmarshaller.unmarshal(file);
System.out.println("id " + div1.getSpan().get(0).get(id) + "value " + div1.getSpan().get(0).get(id))
// should print: id = x1x1 value = bla bla
I have the next class:
public class Span
List<Div> div;
public List<Div> getDiv() {
return div;
}
@XmlElement
public void setDiv(List<Div> div) {
for (int i = 0 ; i<div.size(); i++){
System.out.print("element")}
this.div = div;
}
and:
public class Div
List<Span> span = div1.get
@XmlElement
public void setSpan(List<Span> span) {
for (int i = 0 ; i<span.size(); i++){
System.out.print("element")}
this.span = span;
}
public List<Button> getSpan() {
return span;
}
Now, I want also the value of the span ("bla bla"). so I add to the class Span
:
String value;
public String getValue() {
return value;
}
@XmlValue
public void setValue(String value) {
this.value = value;
}
Bit it gives me the next error:
If a class has '@XmlElement' property, it cannot have '@XmlValue' property.
I try to use @XMLMixed, but without success. I would be happy for example with code example. Thanks.
Annotation Type XmlValue Enables mapping a class to a XML Schema complex type with a simpleContent or a XML Schema simple type. Usage: The @XmlValue annotation can be used with the following program elements: a JavaBean property. non static, non transient field.
Using the xjb and schemagen tools on JDK 11 The JAXB-specific xjc and schemagen tools, which you use to convert an XML Schema (*. xsd file) to a set of Java classes and vice versa, are included with the JDK up to version 10, but have been removed in JDK 11.
Java Architecture for XML Binding (JAXB) provides a fast and convenient way to bind XML schemas and Java representations, making it easy for Java developers to incorporate XML data and processing functions in Java applications.
@XmlAccessorType. Package, Class. Defines the fields and properties of your Java classes that the JAXB engine uses for binding. It has four values: PUBLIC_MEMBER , FIELD , PROPERTY and NONE .
UPDATE
Any element that can have both child notes that are text and elements is said to have mixed content. In JAXB this corresponds to the @XmlMixed
annotation. @XmlMixed
can be used on its own on a collection property (see ORIGINAL ANSWER) or in combination with @XmlAnyElement
, @XmlElementRef
, or @XmlElementRefs
. If the element can be anything you would use @XmlAnyElement
, if it is one known element you would use @XmlElementRef
and it is more than one known element you use @XmlElementRefs
.
Span
If there will be both text and div
elements within the same span element you could do the following by annotating a property with both @XmlElementRef
and @XmlMixed
. The element name specified on the @XmlElementRef
annotation must correspond directly to the root element specified for the target class.
@XmlRootElement
public class Span {
List<Object> items = new ArrayList<Object>();
@XmlMixed
@XmlElementRef(type=Div.class, name="div")
public List<Object> getItems() {
return items;
}
public void setItems(List<Object> mixed) {
this.items = items;
}
}
Div
The metadata for Div
is almost identical to the metadata specified for Span
.
@XmlRootElement
public class Div {
List<Object> items = new ArrayList<Object>();
@XmlElementRef(name="span", type=Span.class)
@XmlMixed
public List<Object> getItems() {
return items;
}
public void setItems(List<Object> items) {
this.items = items;
}
}
Demo
public class Demo {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(Span.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
Span span = (Span) unmarshaller.unmarshal(new StringReader("<span>Text<div>Text2</div>Text3</span>"));
System.out.println(span.getItems());
Marshaller marshaller = jc.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.marshal(span, System.out);
}
}
Output
[Text, forum15495156.Div@289f6ae, Text3]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<span>Text<div>Text2</div>Text3</span>
ORIGINAL ANSWER
You could add a List<String>
property annotated with @XmlMixed
to your Span
class:
Span
import java.util.List;
import javax.xml.bind.annotation.*;
@XmlRootElement
public class Span {
List<Div> div;
List<String> mixed;
@XmlMixed
public List<String> getMixed() {
return mixed;
}
public void setMixed(List<String> mixed) {
this.mixed = mixed;
}
public List<Div> getDiv() {
return div;
}
@XmlElement
public void setDiv(List<Div> div) {
for (int i = 0; i < div.size(); i++) {
System.out.print("element");
}
this.div = div;
}
}
Demo
import java.io.StringReader;
import javax.xml.bind.*;
public class Demo {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(Span.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
Span span1 = (Span) unmarshaller.unmarshal(new StringReader("<span>bla bla bla</span>"));
System.out.println(span1.getMixed());
Span span2 = (Span) unmarshaller.unmarshal(new StringReader("<span><div/><div/></span>"));
System.out.println(span2.getDiv());
}
}
Output
[bla bla bla]
elementelement[forum15495156.Div@1f80ce47, forum15495156.Div@4166a779]
Often, XML documents that you need to bind with JAXB do not come with an XSD for the content, but there are some great tools for automating this work, if you have an XSD. This is the process I use to fill this gap quickly and get quality binding code. Hopefully this helps answer this question and provides a general solution for this type of problem.
This is the process that I used to create the code for this random piece of XML:
The entire process took me under 5 minutes, with the tools preinstalled, and produces high quality results. This is a very simple example, but the complexity of the example XML document could easily go up, without increasing the process time or lowering quality.
The example document is the most important part of this process. For more complex structures, you may need several documents to capture the information that you need, but we will stick to a single document case. We can make an example for the problem by wrapping the provided input in a <div/>
, to create a file called example.xml
:
<div>
<div>
<span id="x1x1"> bla bla </span>
</div>
<span>
<div> bla bla </div>
</span>
</div>
This example demonstrates that the <div/>
s and <span/>
s can be nested in each other and contain content.
NOTE: This HTML fragment is not valid, since block level elements cannot be nested inside inline elements. An "off the shelf" schema, and code generated from it, would probably choke on this input.
This is the voodoo step in this process. Creating the XSD by hand would introduce a lot of work and possibility for error. Without an automated process, you might as well ditch the complexity of the generator and hand code the annotations. Luckily, there is a tool called Trang that will fill this gap.
Trang can do a lot of things, but one task that it excels at is producing XSDs from XML documents. For simple structures, it can completely handle this step. For more complex input, it can get you most of the way there.
Trang is available from Maven Central at this vector:
<dependency>
<groupId>com.thaiopensource</groupId>
<artifactId>trang</artifactId>
<version>20091111</version>
</dependency>
You can download and transform the example.xml
document with these commands:
wget http://repo1.maven.org/maven2/com/thaiopensource/trang/20091111/trang-20091111.jar
java -jar trang-20091111.jar example.xml example.xsd
This produces example.xsd
:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="div">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="div"/>
<xs:element ref="span"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="span">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="div"/>
</xs:sequence>
<xs:attribute name="id" type="xs:NCName"/>
</xs:complexType>
</xs:element>
</xs:schema>
For simple documents, that is usually all it takes. For more complex structures, you may have to edit this file a little, but at least you have a working XSD as a starting point.
Now that we have an XSD, we can leverage the XJC tool and produce the binding code that we are looking for. To run XJC, pass it an XSD, the package you want to create, and a src directory. These two commands will generate the code for example.xsd
in a package called example
:
mkdir src
xjc -d src -p example example.xsd
Now, you will have the following files in the src
directory:
src/example/Div.java
src/example/ObjectFactory.java
src/example/Span.java
I have included the contents of the files at the end of this article, but here is the piece we are interested in, from Span.java
:
@XmlElementRefs({
@XmlElementRef(name = "div", type = Div.class),
@XmlElementRef(name = "span", type = Span.class)
})
@XmlMixed
protected List<Object> content;
Although hand coding the annotations can work, automating the creation of these files can save time and improve quality. It also gives you access to all of the plugins that are available for the XJC tool.
example/Div.java:
//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, vJAXB 2.1.10 in JDK 6
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a>
// Any modifications to this file will be lost upon recompilation of the source schema.
// Generated on: 2013.03.22 at 01:15:22 PM MST
//
package example;
import java.util.ArrayList;
import java.util.List;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElementRef;
import javax.xml.bind.annotation.XmlElementRefs;
import javax.xml.bind.annotation.XmlMixed;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;
/**
* <p>Java class for anonymous complex type.
*
* <p>The following schema fragment specifies the expected content contained within this class.
*
* <pre>
* <complexType>
* <complexContent>
* <restriction base="{http://www.w3.org/2001/XMLSchema}anyType">
* <choice maxOccurs="unbounded" minOccurs="0">
* <element ref="{}div"/>
* <element ref="{}span"/>
* </choice>
* </restriction>
* </complexContent>
* </complexType>
* </pre>
*
*
*/
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
"content"
})
@XmlRootElement(name = "div")
public class Div {
@XmlElementRefs({
@XmlElementRef(name = "div", type = Div.class),
@XmlElementRef(name = "span", type = Span.class)
})
@XmlMixed
protected List<Object> content;
/**
* Gets the value of the content property.
*
* <p>
* This accessor method returns a reference to the live list,
* not a snapshot. Therefore any modification you make to the
* returned list will be present inside the JAXB object.
* This is why there is not a <CODE>set</CODE> method for the content property.
*
* <p>
* For example, to add a new item, do as follows:
* <pre>
* getContent().add(newItem);
* </pre>
*
*
* <p>
* Objects of the following type(s) are allowed in the list
* {@link Div }
* {@link String }
* {@link Span }
*
*
*/
public List<Object> getContent() {
if (content == null) {
content = new ArrayList<Object>();
}
return this.content;
}
}
example/Span.java
//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, vJAXB 2.1.10 in JDK 6
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a>
// Any modifications to this file will be lost upon recompilation of the source schema.
// Generated on: 2013.03.22 at 01:15:22 PM MST
//
package example;
import java.util.ArrayList;
import java.util.List;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlElementRef;
import javax.xml.bind.annotation.XmlMixed;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlSchemaType;
import javax.xml.bind.annotation.XmlType;
import javax.xml.bind.annotation.adapters.CollapsedStringAdapter;
import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
/**
* <p>Java class for anonymous complex type.
*
* <p>The following schema fragment specifies the expected content contained within this class.
*
* <pre>
* <complexType>
* <complexContent>
* <restriction base="{http://www.w3.org/2001/XMLSchema}anyType">
* <sequence>
* <element ref="{}div" maxOccurs="unbounded" minOccurs="0"/>
* </sequence>
* <attribute name="id" type="{http://www.w3.org/2001/XMLSchema}NCName" />
* </restriction>
* </complexContent>
* </complexType>
* </pre>
*
*
*/
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
"content"
})
@XmlRootElement(name = "span")
public class Span {
@XmlElementRef(name = "div", type = Div.class)
@XmlMixed
protected List<Object> content;
@XmlAttribute
@XmlJavaTypeAdapter(CollapsedStringAdapter.class)
@XmlSchemaType(name = "NCName")
protected String id;
/**
* Gets the value of the content property.
*
* <p>
* This accessor method returns a reference to the live list,
* not a snapshot. Therefore any modification you make to the
* returned list will be present inside the JAXB object.
* This is why there is not a <CODE>set</CODE> method for the content property.
*
* <p>
* For example, to add a new item, do as follows:
* <pre>
* getContent().add(newItem);
* </pre>
*
*
* <p>
* Objects of the following type(s) are allowed in the list
* {@link Div }
* {@link String }
*
*
*/
public List<Object> getContent() {
if (content == null) {
content = new ArrayList<Object>();
}
return this.content;
}
/**
* Gets the value of the id property.
*
* @return
* possible object is
* {@link String }
*
*/
public String getId() {
return id;
}
/**
* Sets the value of the id property.
*
* @param value
* allowed object is
* {@link String }
*
*/
public void setId(String value) {
this.id = value;
}
}
example/ObjectFactory.java
//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, vJAXB 2.1.10 in JDK 6
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a>
// Any modifications to this file will be lost upon recompilation of the source schema.
// Generated on: 2013.03.22 at 01:15:22 PM MST
//
package example;
import javax.xml.bind.annotation.XmlRegistry;
/**
* This object contains factory methods for each
* Java content interface and Java element interface
* generated in the example package.
* <p>An ObjectFactory allows you to programatically
* construct new instances of the Java representation
* for XML content. The Java representation of XML
* content can consist of schema derived interfaces
* and classes representing the binding of schema
* type definitions, element declarations and model
* groups. Factory methods for each of these are
* provided in this class.
*
*/
@XmlRegistry
public class ObjectFactory {
/**
* Create a new ObjectFactory that can be used to create new instances of schema derived classes for package: example
*
*/
public ObjectFactory() {
}
/**
* Create an instance of {@link Div }
*
*/
public Div createDiv() {
return new Div();
}
/**
* Create an instance of {@link Span }
*
*/
public Span createSpan() {
return new Span();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With