Which XML parser to use here?

Tags:

I am receving an XML file as an input, whose size can vary from a few KBs to a lot more. I am getting this file over a network. I need to extract a small number of nodes as per my use, so most of the document is pretty useless for me. I have no memory preferences, I just need speed.

Considering all this, I concluded :

Not using DOM here (due to possible huge size of doc , no CRUD requirement, and source being network)
No SAX as I only need to get a small subset of data.
StaX can be a way to go, but I am not sure if it is the fastest way.
JAXB came up as another option - but what sort of parser does it use ? I read it uses Xerces by default (which is what type - push or pull ?), although I can configure it for use with Stax or Woodstock as per this link

I am reading a lot, still confused with so many options ! Any help would be appreciated.

Thanks !

Edit : I want to add one more question here : What is wrong in using JAXB here ?

552

asked Aug 14 '11 16:08

zombie

2 Answers

Fastest solution is by far a StAX parser, specially as you only need a specific subset of the XML file and you can easily ignore whatever isn't really necessary using StAX, while you would receive the event anyway if you were using a SAX parser.

But it's also a little bit more complicated than using SAX or DOM. One of these days I had to write a StAX parser for the following XML:

<?xml version="1.0"?>
<table>
    <row>
        <column>1</column>
        <column>Nome</column>
        <column>Sobrenome</column>
        <column>[email protected]</column>
        <column></column>
        <column>2011-06-22 03:02:14.915</column>
        <column>2011-06-22 03:02:25.953</column>
        <column></column>
        <column></column>
    </row>
</table>

Here's how the final parser code looks like:

public class Parser {

private String[] files ;

public Parser(String ... files) {
    this.files = files;
}

private List<Inscrito> process() {

    List<Inscrito> inscritos = new ArrayList<Inscrito>();


    for ( String file : files ) {

        XMLInputFactory factory = XMLInputFactory.newFactory();

        try {

            String content = StringEscapeUtils.unescapeXml( FileUtils.readFileToString( new File(file) ) );

            XMLStreamReader parser = factory.createXMLStreamReader( new ByteArrayInputStream( content.getBytes() ) );

            String currentTag = null;
            int columnCount = 0;
            Inscrito inscrito = null;           

            while ( parser.hasNext() ) {

                int currentEvent = parser.next();

                switch ( currentEvent ) {
                case XMLStreamReader.START_ELEMENT: 

                    currentTag = parser.getLocalName();

                    if ( "row".equals( currentTag ) ) {
                        columnCount = 0;
                        inscrito = new Inscrito();                      
                    }

                    break;
                case XMLStreamReader.END_ELEMENT:

                    currentTag = parser.getLocalName();

                    if ( "row".equals( currentTag ) ) {
                        inscritos.add( inscrito );
                    }

                    if ( "column".equals( currentTag ) ) {
                        columnCount++;
                    }                   

                    break;
                case XMLStreamReader.CHARACTERS:

                    if ( "column".equals( currentTag ) ) {

                        String text = parser.getText().trim().replaceAll( "\n" , " "); 

                        switch( columnCount ) {
                        case 0:
                            inscrito.setId( Integer.valueOf( text ) );
                            break;
                        case 1:                         
                            inscrito.setFirstName( WordUtils.capitalizeFully( text ) );
                            break;
                        case 2:
                            inscrito.setLastName( WordUtils.capitalizeFully( text ) );
                            break;
                        case 3:
                            inscrito.setEmail( text );
                            break;
                        }

                    }

                    break;
                }

            }

            parser.close();

        } catch (Exception e) {
            throw new IllegalStateException(e);
        }           

    }

    Collections.sort(inscritos);

    return inscritos;

}

public Map<String,List<Inscrito>> parse() {

    List<Inscrito> inscritos = this.process();

    Map<String,List<Inscrito>> resultado = new LinkedHashMap<String, List<Inscrito>>();

    for ( Inscrito i : inscritos ) {

        List<Inscrito> lista = resultado.get( i.getInicial() );

        if ( lista == null ) {
            lista = new ArrayList<Inscrito>();
            resultado.put( i.getInicial(), lista );
        }

        lista.add( i );

    }

    return resultado;
}

}

The code itself is in portuguese but it should be straightforward for you to understand what it is, here's the repo on github.

162

answered Oct 02 '22 04:10

Maurício Linhares

If you're only extracting a small amount, consider looking into using XPath as this is somewhat simpler than trying to extract the whole document.

answered Oct 02 '22 06:10

Hovercraft Full Of Eels

Related questions
                            
                                AmazonS3, how to check if the upload succeeded?
                            
                                How to get supported video camera resolutions in android?
                            
                                How to call additional method in enums?
                            
                                Adding an SAN to an SSL cert (in Java) [duplicate]
                            
                                Android: Are all activities in an Android app run in the same thread or separate threads of their own?
                            
                                Quickest and most efficient way to traverse an ArrayList in reverse
                            
                                Unable to change charset from ISO-8859-1 to UTF-8 in glassfish 3.1
                            
                                Launching Jade In Ubuntu10.04 Linux
                            
                                Eclipse slow at building Android resources
                            
                                Case Insensitive sorting using Google Guava
                            
                                BeanUtils.copyProperties() vs DozerBeanMapper.map()
                            
                                Why are most types in C# inherited from System.Object? [duplicate]
                            
                                Using Javamail and Greenmail for SMTPS/SSL
                            
                                How to unblock InputStream.read() on Android?
                            
                                Can Hibernate tool generate JPA POJO?
                            
                                Regular Expression Pattern to Match Words in All Caps That Are Followed By a colon
                            
                                java static field from null [duplicate]
                            
                                How can I tell who calls System.gc()?
                            
                                Unexplained parenthesise in Java
                            
                                sd.canWrite() always returns false

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which XML parser to use here?

Tags:

java

xml

xml-parsing

jaxb

zombie

People also ask

2 Answers

Maurício Linhares

Hovercraft Full Of Eels

Recent Activity

Donate For Us