Python version and Device used
I'm following the BeautifulSoup tutorial but when I try to parse a xml page using the lxml library I get the following error:
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml,xml. Do you need to install a parser library?
I am sure that I already installed lxml by all methods: easy_install, pip, port, etc. I tried to add a line to my code to see if lxml is installed or not:
import lxml
Then python can just successfully go through this code and display the previous error message again, occurring at the same line.
So I am quite sure that lxml was installed, but not installed correctly. So I decided to uninstall lxml, and then re-install using a 'correct' method. But when I type in
easy_install -m lxml
I get the following error:
Searching for lxml Best match: lxml 3.2.1 Processing lxml-3.2.1-py2.7-macosx-10.6-intel.egg Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml- 3.2.1-py2.7-macosx-10.6-intel.egg Because this distribution was installed --multi-version, before you can import modules from this package in an application, you will need to 'import pkg_resources' and then use a 'require()' call similar to one of these examples, in order to select the desired version: pkg_resources.require("lxml") # latest installed version pkg_resources.require("lxml==3.2.1") # this exact version pkg_resources.require("lxml>=3.2.1") # this version or higher Processing dependencies for lxml Finished processing dependencies for lxml
So I don't know how to continue my uninstall, I looked up many posts about this issue on google but still I can't find any useful info.
import mechanize from bs4 import BeautifulSoup import lxml class count: def __init__(self,protein): self.proteinCode = protein self.br = mechanize.Browser() def first_search(self): #Test 0 soup = BeautifulSoup(self.br.open("http://www.ncbi.nlm.nih.gov/protein/21225921?report=genbank&log$=prottop&blast_rank=1&RID=YGJHMSET015"), ['lxml','xml']) return if __name__=='__main__': proteinCode = sys.argv[1] gogogo = count(proteinCode)
You can check if you have the lxml package installed by running the pip show lxml command. Copied! The pip show lxml command will either state that the package is not installed or show a bunch of information about the package, including the location where the package is installed.
In case you want to use the current in-development version of lxml, you can get it from the github repository at https://github.com/lxml/lxml . Note that this requires Cython to build the sources, see the build instructions on the project home page.
I am using BeautifulSoup 4.3.2 and OS X 10.6.8. I also have a problem with improperly installed lxml
. Here are some things that I found out:
First of all, check this related question: Removed MacPorts, now Python is broken
Now, in order to check which builders for BeautifulSoup 4 are installed, try
>>> import bs4 >>> bs4.builder.builder_registry.builders
If you don't see your favorite builder, then it is not installed, and you will see an error as above ("Couldn't find a tree builder...").
Also, just because you can import lxml
, doesn't mean that everything is perfect.
Try
>>> import lxml >>> import lxml.etree
To understand what's going on, go to the bs4
installation and open the egg (tar -xvzf
). Notice the modules bs4.builder
. Inside it you should see files such as _lxml.py
and _html5lib.py
. So you can also try
>>> import bs4.builder.htmlparser >>> import bs4.builder._lxml >>> import bs4.builder._html5lib
If there is a problem, you will see, why a parricular module cannot be loaded. You can notice how at the end of builder/__init__.py
it loads all those modules and ignores whatever was not loaded:
# Builders are registered in reverse order of priority, so that custom # builder registrations will take precedence. In general, we want lxml # to take precedence over html5lib, because it's faster. And we only # want to use HTMLParser as a last result. from . import _htmlparser register_treebuilders_from(_htmlparser) try: from . import _html5lib register_treebuilders_from(_html5lib) except ImportError: # They don't have html5lib installed. pass try: from . import _lxml register_treebuilders_from(_lxml) except ImportError: # They don't have lxml installed. pass
If you are using Python2.7 in Ubuntu/Debian, this worked for me:
$ sudo apt-get build-dep python-lxml $ sudo pip install lxml
Test it like:
mona@pascal:~/computer_vision/image_retrieval$ python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import lxml
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With