I have the following XML file :
<user-login-permission>true</user-login-permission>
<total-matched-record-number>15000</total-matched-record-number>
<total-returned-record-number>15000</total-returned-record-number>
<active-user-records>
<active-user-record>
<active-user-name>username</active-user-name>
<authentication-realm>realm</authentication-realm>
<user-roles>Role</user-roles>
<user-sign-in-time>date</user-sign-in-time>
<events>0</events>
<agent-type>text</agent-type>
<login-node>node</login-node>
</active-user-record>
There are many records I'm trying to get values from tags and save them in a different text file using the following code :
soup = BeautifulSoup(open("path/to/xmlfile"), features="xml")
with open('path/to/outputfile', 'a') as f:
for i in range(len(soup.findall('active-user-name'))):
f.write ('%s\t%s\t%s\t%s\n' % (soup.findall('active-user-name')[i].text, soup.findall('authentication-realm')[i].text, soup.findall('user-roles')[i].text, soup.findall('login-node')[i].text))
I get the error TypeError : 'NoneType' object not callable Python with BeautifulSoup XML for line : for i in range(len(soup.findall('active-user-name'))):
Any idea what could be causing this?
Thanks!
There are a number of issues that need to be addressed with this, the first is that the XML file you provided is not valid XML - a root element is required.
Try something like this as the XML:
<root>
<user-login-permission>true</user-login-permission>
<total-matched-record-number>15000</total-matched-record-number>
<total-returned-record-number>15000</total-returned-record-number>
<active-user-records>
<active-user-record>
<active-user-name>username</active-user-name>
<authentication-realm>realm</authentication-realm>
<user-roles>Role</user-roles>
<user-sign-in-time>date</user-sign-in-time>
<events>0</events>
<agent-type>text</agent-type>
<login-node>node</login-node>
</active-user-record>
</active-user-records>
</root>
Now onto the python. First off there is not a findall
method, it's either findAll
or find_all
. findAll
and find_all
are equivalent, as documented here
Next up I would suggest altering your code so you aren't making use of the find_all
method quite so often - using find
instead will improve the efficiency, especially for large XML files. Additionally the code below is easier to read and debug:
from bs4 import BeautifulSoup
xml_file = open('./path_to_file.xml', 'r')
soup = BeautifulSoup(xml_file, "xml")
with open('./path_to_output_f.txt', 'a') as f:
for s in soup.findAll('active-user-record'):
username = s.find('active-user-name').text
auth = s.find('authentication-realm').text
role = s.find('user-roles').text
node = s.find('login-node').text
f.write("{}\t{}\t{}\t{}\n".format(username, auth, role, node))
Hope this helps. Let me know if you require any further assistance!
The solution is simple: don't use findall
method - use find_all
.
Why? Because there is no findall
method at all, there are findAll
and find_all
, which are equivalent. See docs for more information.
Though, I agree, error message is confusing.
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With