Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to re-install lxml?

Tags:

Python version and Device used

  • Python 2,7.5
  • Mac 10.7.5
  • BeautifulSoup 4.2.1.

I'm following the BeautifulSoup tutorial but when I try to parse a xml page using the lxml library I get the following error:

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml,xml. Do you need to install a parser library? 

I am sure that I already installed lxml by all methods: easy_install, pip, port, etc. I tried to add a line to my code to see if lxml is installed or not:

import lxml 

Then python can just successfully go through this code and display the previous error message again, occurring at the same line.

So I am quite sure that lxml was installed, but not installed correctly. So I decided to uninstall lxml, and then re-install using a 'correct' method. But when I type in

easy_install -m  lxml 

I get the following error:

Searching for lxml Best match: lxml 3.2.1 Processing lxml-3.2.1-py2.7-macosx-10.6-intel.egg  Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml- 3.2.1-py2.7-macosx-10.6-intel.egg  Because this distribution was installed --multi-version, before you can import modules from this package in an application, you will need to 'import pkg_resources' and then use a 'require()' call similar to one of these examples, in order to select the desired version:  pkg_resources.require("lxml")  # latest installed version pkg_resources.require("lxml==3.2.1")  # this exact version pkg_resources.require("lxml>=3.2.1")  # this version or higher  Processing dependencies for lxml Finished processing dependencies for lxml 

So I don't know how to continue my uninstall, I looked up many posts about this issue on google but still I can't find any useful info.


Here is my Source code

import mechanize from bs4 import BeautifulSoup import lxml  class count:     def __init__(self,protein):         self.proteinCode = protein         self.br = mechanize.Browser()      def first_search(self):         #Test 0         soup = BeautifulSoup(self.br.open("http://www.ncbi.nlm.nih.gov/protein/21225921?report=genbank&log$=prottop&blast_rank=1&RID=YGJHMSET015"), ['lxml','xml'])         return  if __name__=='__main__':     proteinCode = sys.argv[1]     gogogo = count(proteinCode) 

Questions

  1. How can I uninstall lxml?
  2. How can I install lxml 'correctly'? How do I know that it is correctly installed?
like image 455
Mark23333 Avatar asked Jul 20 '13 21:07

Mark23333


People also ask

How do I know if lxml is installed?

You can check if you have the lxml package installed by running the pip show lxml command. Copied! The pip show lxml command will either state that the package is not installed or show a bunch of information about the package, including the location where the package is installed.

How do I download lxml?

In case you want to use the current in-development version of lxml, you can get it from the github repository at https://github.com/lxml/lxml . Note that this requires Cython to build the sources, see the build instructions on the project home page.


2 Answers

I am using BeautifulSoup 4.3.2 and OS X 10.6.8. I also have a problem with improperly installed lxml. Here are some things that I found out:

First of all, check this related question: Removed MacPorts, now Python is broken

Now, in order to check which builders for BeautifulSoup 4 are installed, try

>>> import bs4 >>> bs4.builder.builder_registry.builders 

If you don't see your favorite builder, then it is not installed, and you will see an error as above ("Couldn't find a tree builder...").

Also, just because you can import lxml, doesn't mean that everything is perfect.

Try

>>> import lxml >>> import lxml.etree 

To understand what's going on, go to the bs4 installation and open the egg (tar -xvzf). Notice the modules bs4.builder. Inside it you should see files such as _lxml.py and _html5lib.py. So you can also try

>>> import bs4.builder.htmlparser >>> import bs4.builder._lxml >>> import bs4.builder._html5lib 

If there is a problem, you will see, why a parricular module cannot be loaded. You can notice how at the end of builder/__init__.py it loads all those modules and ignores whatever was not loaded:

# Builders are registered in reverse order of priority, so that custom # builder registrations will take precedence. In general, we want lxml # to take precedence over html5lib, because it's faster. And we only # want to use HTMLParser as a last result. from . import _htmlparser register_treebuilders_from(_htmlparser) try:     from . import _html5lib     register_treebuilders_from(_html5lib) except ImportError:     # They don't have html5lib installed.     pass try:     from . import _lxml     register_treebuilders_from(_lxml) except ImportError:     # They don't have lxml installed.     pass 
like image 85
Sergey Orshanskiy Avatar answered Oct 03 '22 22:10

Sergey Orshanskiy


If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml $ sudo pip install lxml  

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python Python 2.7.6 (default, Jun 22 2015, 17:58:13)  [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import lxml 
like image 22
Mona Jalal Avatar answered Oct 03 '22 21:10

Mona Jalal