Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issues with Python pandas: read_html and python3-lxml installation

I'm trying to run the following code, to no avail. To my knowledge, there aren't any syntax errors.

import quandl
import pandas as pd

fifty_states =pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fifty_states)

I'm getting the following error when I run this code:

Traceback (most recent call last):

File "C:/Users/Dave/Documents/Python Files/helloworld.py", line 15, in fiddy_states = pd.read_html('http://simple.wikipedia.org/wiki/List_of_U.S._states')

File "C:\Python35\lib\site-packages\pandas\io\html.py", line 874, in read_html parse_dates, tupleize_cols, thousands, attrs, encoding)

File "C:\Python35\lib\site-packages\pandas\io\html.py", line 726, in _parse parser = _parser_dispatch(flav)

File "C:\Python35\lib\site-packages\pandas\io\html.py", line 685, in _parser_dispatch raise ImportError("lxml not found, please install it")

ImportError: lxml not found, please install it

Not too sure why this is occurring, as I (should) have all the packages required to run this code. I have problems installing lxml and python3-lxml, as the packages fail to install. As a backup, I've installed the following:

python-dev libxml2-dev libxslt1-dev zlib1g-dev

in addition to 'html5lib', which I've read is a suitable replacement to lxml.

Not sure what else to do at this point, since searching for similar corrections (i.e. installing lxml) don't apply to me (I can't install lxml in any format via pip on the command line).

Any help is much appreciated.

Edit: It appears that lxml was never installed on my computer. It's weird, because I'm unable to install it via pip install lxml. Here're the error logs I get when attempting an install:

Collecting lxml
  Using cached lxml-3.6.4.tar.gz
Building wheels for collected packages: lxml
  Running setup.py bdist_wheel for lxml ... error
  Complete output from command c:\python35\python.exe -u -c "import setuptools,
tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\l
xml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().rep
lace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d C:\Users\Dwang\AppData\Lo
cal\Temp\tmpm9z4yol6pip-wheel- --python-tag cp35:
  Building lxml version 3.6.4.
  Building without Cython.
  ERROR: b"'xslt-config' is not recognized as an internal or external command,\r
\noperable program or batch file.\r\n"
  ** make sure the development packages of libxml2 and libxslt are installed **

  Using build configuration of libxslt
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.5
  creating build\lib.win-amd64-3.5\lxml
  copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
  creating build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includes

  creating build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
  creating build\lib.win-amd64-3.5\lxml\isoschematron
  copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\iso
schematron
  copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
  copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
  copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includes

  copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
  copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
  copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
  copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inclu
des
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
  copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.w
in-amd64-3.5\lxml\isoschematron\resources\rng
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
  copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
  copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematr
on-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract
_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sche
matron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_inc
lude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schemat
ron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-s
chematron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resource
s\xsl\iso-schematron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for
_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -
> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
  running build_ext
  building 'lxml.etree' extension
  error: Unable to find vcvarsall.bat

  ----------------------------------------
  Failed building wheel for lxml
  Running setup.py clean for lxml
Failed to build lxml
Installing collected packages: lxml
  Running setup.py install for lxml ... error
    Complete output from command c:\python35\python.exe -u -c "import setuptools
, tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\
\lxml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().r
eplace('\r\n', '\n'), __file__, 'exec'))" install --record C:\Users\Dwang\AppDat
a\Local\Temp\pip-4_tf2u3a-record\install-record.txt --single-version-externally-
managed --compile:
    Building lxml version 3.6.4.
    Building without Cython.
    ERROR: b"'xslt-config' is not recognized as an internal or external command,
\r\noperable program or batch file.\r\n"
    ** make sure the development packages of libxml2 and libxslt are installed *
*

    Using build configuration of libxslt
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.5
    creating build\lib.win-amd64-3.5\lxml
    copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
    creating build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includ
es
    creating build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
    creating build\lib.win-amd64-3.5\lxml\isoschematron
    copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\i
soschematron
    copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\include
s
    copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\in
cludes
    copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
    copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
    copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
    copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
    copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
    copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes

    copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inc
ludes
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
    copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib
.win-amd64-3.5\lxml\isoschematron\resources\rng
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
    copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
    copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schema
tron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstra
ct_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sc
hematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_i
nclude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso
-schematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resour
ces\xsl\iso-schematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_f
or_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sch
ematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt
 -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt
1
    running build_ext
    building 'lxml.etree' extension
    error: Unable to find vcvarsall.bat

    ----------------------------------------
Command "c:\python35\python.exe -u -c "import setuptools, tokenize;__file__='C:\
\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\lxml\\setup.py';exec(co
mpile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __
file__, 'exec'))" install --record C:\Users\Dwang\AppData\Local\Temp\pip-4_tf2u3
a-record\install-record.txt --single-version-externally-managed --compile" faile
d with error code 1 in C:\Users\Dwang\AppData\Local\Temp\pip-build-738bf61u\lxml
\
like image 375
wowdavers Avatar asked Aug 28 '16 23:08

wowdavers


2 Answers

From what I understand and according to the docs, if read_html() fails to use lxml, it should fall back to html5lib, but it looks ike it does not happen in your case and an error is thrown.

Try to explicitly state the flavor:

fifty_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states', flavor='html5lib`)
like image 68
alecxe Avatar answered Oct 22 '22 12:10

alecxe


Try

$ conda install -c conda-forge lxml
like image 24
Yoshiyasu Takefuji Avatar answered Oct 22 '22 12:10

Yoshiyasu Takefuji