Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is my understanding of DTD correct?

Tags:

xml

dtd

I am self-learning XML and here is the first DTD I wrote. Below is the XML data followed by the DTD.

<?xml version="1.0" encoding="unicode" ?>
<!DOCTYPE people SYSTEM "validator.dtd">

<people>
    <student>
        <name>John</name>
        <course>Computer Technology</course>
        <semester>6</semester>
        <scheme>E</scheme>
    </student>

    <student>
        <name>Foo</name>
        <course>Industrial Electronics</course>
        <semester>6</semester>
        <scheme>E</scheme>
    </student>
</people>  

and the DTD

<!ELEMENT people (student)*>
<!ELEMENT student (name,course,semester,scheme)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT course (#PCDATA)>
<!ELEMENT semester (#PCDATA)>
<!ELEMENT scheme (#PCDATA)>  

Here is my understanding of the DTD so far.
I have a root named people that has student inside of it. Now, since I have a * then I can have zero or more of students inside. But I guess it should be changed to + (one or more) because it makes more sense ?

Inside student is name, course semester and scheme. When I leave out any symbols after the closing parentheses then it means that each of these tags can appear only once inside the student tag. This means a student can not have more than one name, more than one semester, etc

Finally, name, course, semester and scheme have #PCDATA which means data is going to be parsed by someone else and does excluding the symbol here make a difference?

like image 495
An SO User Avatar asked Dec 20 '25 16:12

An SO User


1 Answers

You said:

When I leave out any symbols after the closing parentheses then it means that each of these tags can appear only once inside the student tag.

I would only add that each of those tags must appear once; they are not optional unless you have a ? or *. Also, they must appear in that order (since you used ,).

#PCDATA means parsed character data which is basically text that will be parsed by the parser. For example, the text "Sample &text;" would get parsed and the &text; entity reference would be resolved.

The only time you really need a symbol (occurrence indicator) for #PCDATA is when you have mixed content (both text and elements). It has to be an * too:

<!ELEMENT elem (#PCDATA|anotherElem)*>
like image 82
Daniel Haley Avatar answered Dec 23 '25 10:12

Daniel Haley



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!