Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError : 'NoneType' object not callable Python with BeautifulSoup XML

I have the following XML file :

<user-login-permission>true</user-login-permission>
        <total-matched-record-number>15000</total-matched-record-number>
        <total-returned-record-number>15000</total-returned-record-number>
        <active-user-records>
            <active-user-record>
                <active-user-name>username</active-user-name>
                <authentication-realm>realm</authentication-realm>
                <user-roles>Role</user-roles>
                <user-sign-in-time>date</user-sign-in-time>
                <events>0</events>
                <agent-type>text</agent-type>
                <login-node>node</login-node>
             </active-user-record> 

There are many records I'm trying to get values from tags and save them in a different text file using the following code :

soup = BeautifulSoup(open("path/to/xmlfile"), features="xml") 


with open('path/to/outputfile', 'a') as f:
    for i in range(len(soup.findall('active-user-name'))):
        f.write ('%s\t%s\t%s\t%s\n' % (soup.findall('active-user-name')[i].text, soup.findall('authentication-realm')[i].text, soup.findall('user-roles')[i].text, soup.findall('login-node')[i].text))

I get the error TypeError : 'NoneType' object not callable Python with BeautifulSoup XML for line : for i in range(len(soup.findall('active-user-name'))):

Any idea what could be causing this?

Thanks!

like image 403
user2633192 Avatar asked Aug 01 '13 07:08

user2633192


2 Answers

There are a number of issues that need to be addressed with this, the first is that the XML file you provided is not valid XML - a root element is required.

Try something like this as the XML:

<root>
    <user-login-permission>true</user-login-permission>
    <total-matched-record-number>15000</total-matched-record-number>
    <total-returned-record-number>15000</total-returned-record-number>
    <active-user-records>

        <active-user-record>
            <active-user-name>username</active-user-name>
            <authentication-realm>realm</authentication-realm>
            <user-roles>Role</user-roles>
            <user-sign-in-time>date</user-sign-in-time>
            <events>0</events>
            <agent-type>text</agent-type>
            <login-node>node</login-node>
        </active-user-record>

    </active-user-records>
</root>

Now onto the python. First off there is not a findall method, it's either findAll or find_all. findAll and find_all are equivalent, as documented here

Next up I would suggest altering your code so you aren't making use of the find_all method quite so often - using find instead will improve the efficiency, especially for large XML files. Additionally the code below is easier to read and debug:

from bs4 import BeautifulSoup

xml_file = open('./path_to_file.xml', 'r')

soup = BeautifulSoup(xml_file, "xml") 

with open('./path_to_output_f.txt', 'a') as f:
    for s in soup.findAll('active-user-record'):
        username = s.find('active-user-name').text
        auth = s.find('authentication-realm').text
        role = s.find('user-roles').text
        node = s.find('login-node').text
        f.write("{}\t{}\t{}\t{}\n".format(username, auth, role, node))

Hope this helps. Let me know if you require any further assistance!

like image 122
Hayden Avatar answered Sep 27 '22 16:09

Hayden


The solution is simple: don't use findall method - use find_all.

Why? Because there is no findall method at all, there are findAll and find_all, which are equivalent. See docs for more information.

Though, I agree, error message is confusing.

Hope that helps.

like image 28
alecxe Avatar answered Sep 27 '22 15:09

alecxe