Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DOCTYPE gives 'unexpected' error when XmlPullParser is used

Tags:

java

android

Below is the error that logcat springs up. It is unable to parse the <!DOCTYPE...> at the start of any xml document. I tried my program with a page that doesn't have the DOCTYPE tag and it worked successfully. I have used setFeature to enable 'FEATURE_PROCESS_DOCDECL', but it doesn't solve the problem.

The Error: org.xmlpull.v1.XmlPullParserException: Unexpected <! (position:START_DOCUMENT null@1:1 in java.io.InputStreamReader@424355f0)

Excerpt of my code:

URL url = new URL("http://www.google.co.in/webhp?hl=en&tab=ww");

                XmlPullParserFactory parser = XmlPullParserFactory.newInstance();
                parser.setNamespaceAware(true);
                parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES,true); 
                parser.setFeature(XmlPullParser.FEATURE_PROCESS_DOCDECL,true); 
                XmlPullParser xpp = parser.newPullParser();

                InputStream ip=url.openConnection().getInputStream();

                xpp.setInput(ip, HTTP.UTF_8);
                Message msg = mHandler.obtainMessage();
            Bundle bundle = new Bundle();


                int eventType=xpp.getEventType();

                if(eventType==XmlPullParser.START_DOCUMENT){

                while(eventType!=XmlPullParser.END_DOCUMENT ){

                    if(eventType!=XmlPullParser.DOCDECL){
                        eventType=xpp.nextToken();
                    }
                    else if(eventType==XmlPullParser.START_TAG){    
                        if(xpp.getName().equalsIgnoreCase("title")){


                        bundle.putString("message", xpp.nextText());
                    msg.setData(bundle);
                    mHandler.sendMessage(msg); 
                            eventType=xpp.nextToken();   
                            }
                            }

                  }
                      }
like image 451
abishekshenoy Avatar asked Mar 24 '26 02:03

abishekshenoy


1 Answers

I've been having a similar problem. Looks like XmlPullParser doesn't accept the lower case <!doctype html>, instead, it expects the upper-cased version <!DOCTYPE html>. (Related: Uppercase or lowercase doctype?).

This can be found at org.kxml2.io.KXmlParser.java:

/**
 * Returns the type of the next token.
 */
private int peekType(boolean inDeclaration) throws IOException, XmlPullParserException {

Beginning at line 1003:

case '!':
                    switch (buffer[position + 2]) {
                        case 'D':
                            return DOCDECL; // <!D
                        case '[':
                            return CDSECT; // <![
                        case '-':
                            return COMMENT; // <!-
                        case 'E':
                            switch (buffer[position + 3]) {
                                case 'L':
                                    return ELEMENTDECL; // <!EL
                                case 'N':
                                    return ENTITYDECL; // <!EN
                            }
                            break;
                        case 'A':
                            return ATTLISTDECL;  // <!A
                        case 'N':
                            return NOTATIONDECL; // <!N
                    }
                    throw new XmlPullParserException("Unexpected <!", this, null);

The workaround I used was to search for that specific line and upper case it.

like image 159
David Salvador Avatar answered Mar 25 '26 14:03

David Salvador