I have a problem using pyarrow.orc module in Anaconda on Windows 10.
import pyarrow.orc as orc
throws an exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\apps\Anaconda3\envs\ws\lib\site-packages\pyarrow\orc.py", line 23, in <module>
import pyarrow._orc as _orc
ModuleNotFoundError: No module named 'pyarrow._orc'
On the other hand:
import pyarrow
works without any issues.
conda list
# packages in environment at C:\apps\Anaconda3\envs\ws:
#
# Name Version Build Channel
arrow-cpp 0.13.0 py37h49ee12d_0
...
numpy 1.17.3 py37h4ceb530_0
numpy-base 1.17.3 py37hc3f5095_0
...
pip 19.3.1 py37_0
pyarrow 0.13.0 py37ha925a31_0
...
python 3.7.5 h8c8aaf0_0
...
I've tried other versions of pyarrow with the same results.
conda -V
conda 4.7.12
Bottom line up front, I had the same error. This was the solution for me:
!pip install pyarrow==0.13.0
I'm not sure this is limited to Windows 10, I am getting the same error in AWS Sagemaker in the last few days. This was working fine before, on a previous Sagemaker instance.
Using the Conda Packages menu in Jupyter, the conda_python3 kernel showed it had pyarrow 0.13.0 installed from https://repo.anaconda.com/pkgs/main/linux-64, build py36he6710b0_0.
However a subsequent call to
!conda -list
Did not show pyarrow as being in the Jupyter conda_python3 kernel, even after restarting the kernel.
Normally in a Sagemaker [Jupyter notebook] instance, I would use !pip commands because they just seem to work better, and don't have the timeout errors I sometimes find with the Conda Packages menu. (Also I don't need to worry about passing -y
flags, the installs just happen)
Normally !pip install pyarrow
was working, but I noticed it was installing pyarrow 0.15.1 from Nov 1, 2019.
Perhaps there is an error in that version with loading the _orc package, or some other conflicting library.
My intuition is that something is wrong with the conda version of pyarrow 0.13.0, and with pyarrow 0.15.1.
In a Jupyter cell I tried this:
!pip uninstall pyarrow -y
!pip install pyarrow
from pyarrow import orc
Output:
Uninstalling pyarrow-0.15.1:
Successfully uninstalled pyarrow-0.15.1
Collecting pyarrow
Downloading https://files.pythonhosted.org/packages/6c/32/ce1926f05679ea5448fd3b98fbd9419d8c7a65f87d1a12ee5fb9577e3a8e/pyarrow-0.15.1-cp36-cp36m-manylinux2010_x86_64.whl (59.2MB)
|████████████████████████████████| 59.2MB 381kB/s eta 0:00:01
Requirement already satisfied: numpy>=1.14 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow) (1.14.3)
Requirement already satisfied: six>=1.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow) (1.11.0)
Installing collected packages: pyarrow
Successfully installed pyarrow-0.15.1
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-6-36378dee5a25> in <module>()
1 get_ipython().system('pip uninstall pyarrow -y')
2 get_ipython().system('pip install pyarrow')
----> 3 from pyarrow import orc
~/anaconda3/envs/python3/lib/python3.6/site-packages/pyarrow/orc.py in <module>()
23 from pyarrow import types
24 from pyarrow.lib import Schema
---> 25 import pyarrow._orc as _orc
26
27
ModuleNotFoundError: No module named 'pyarrow._orc'
Note that when you try to uninstall pyarrow 0.15.1 and install a specific older version, like 0.13.0, you should restart the kernel after uninstalling. There are some incompatible binaries that get left behind. I did not post that output because it was so long.
pip uninstall pyarrow -y
Restart Kernel, then:
!pip install pyarrow==0.13.0
from pyarrow import orc
Output:
Collecting pyarrow==0.13.0
Using cached https://files.pythonhosted.org/packages/ad/25/094b122d828d24b58202712a74e661e36cd551ca62d331e388ff68bae91d/pyarrow-0.13.0-cp36-cp36m-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.14 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow==0.13.0) (1.14.3)
Requirement already satisfied: six>=1.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow==0.13.0) (1.11.0)
Installing collected packages: pyarrow
Successfully installed pyarrow-0.13.0
There is now no error from the import command, and orc files can be read again.
The ORC reader is not supported at all on Windows and has never been to my knowledge. Apache ORC in C++ is not known to build yet with the Visual Studio C++ compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With