How to fix "ImportError: PyArrow = 0.8.0 must be installed; however, it was not found."?

Question

I use PySpark 2.4.0 and when I executed the following code in pyspark:

$ ./bin/pyspark
Python 2.7.16 (default, Mar 25 2019, 15:07:04)
...
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Python version 2.7.16 (default, Mar 25 2019 15:07:04)
SparkSession available as 'spark'.
>>> from pyspark.sql.functions import pandas_udf
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
>>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/x/spark/python/pyspark/sql/functions.py", line 2922, in pandas_udf
    return _create_udf(f=f, returnType=return_type, evalType=eval_type)
  File "/Users/x/spark/python/pyspark/sql/udf.py", line 47, in _create_udf
    require_minimum_pyarrow_version()
  File "/Users/x/spark/python/pyspark/sql/utils.py", line 149, in require_minimum_pyarrow_version
    "it was not found." % minimum_pyarrow_version)
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.

How to fix it?

Jacek Laskowski · Accepted Answer

The error message in this case is misleading, pyarrow wasn't installed.

From the official documentation Spark SQL Guide (that led to Installing PyArrow), you should simply execute one of the following commands:

$ conda install -c conda-forge pyarrow

or

$ pip install pyarrow

It is also important to run it under proper user and Python version. I.e., if one is using Zeppelin under root with Python3, it might be needed to execute

# pip3 install pyarrow

instead

How to fix "ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found."?

Tags:

apache-spark

apache-spark-sql

pyspark

Jacek Laskowski

1 Answers

Jacek Laskowski

Recent Activity

Donate For Us

How to fix "ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found."?

Tags:

apache-spark

apache-spark-sql

pyspark

Jacek Laskowski

1 Answers

Jacek Laskowski

Related questions

Recent Activity

Donate For Us