Im using pandas in a container and I get the following error:
Traceback (most recent call last):
File "/volumes/dependencies/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/volumes/dependencies/site-packages/celery/app/trace.py", line 629, in __protected_call__
return self.run(*args, **kwargs)
File "/volumes/code/autoai/celery/data_template/api.py", line 16, in run_data_template_task
data_template.run(data_bundle, columns=columns)
File "/volumes/code/autoai/models/data_template.py", line 504, in run
self.to_parquet(data_bundle, columns=columns)
File "/volumes/code/autoai/models/data_template.py", line 162, in to_parquet
}, parquet_path=data_file.path, directory="", dataset=self)
File "/volumes/code/autoai/core/datasets/parquet_converter.py", line 46, in convert
file_system.write_dataframe(parquet_path, chunk, directory, append=append)
File "/volumes/code/autoai/core/file_systems.py", line 76, in write_dataframe
append=append)
File "/volumes/dependencies/site-packages/pandas/core/frame.py", line 1945, in to_parquet
compression=compression, **kwargs)
File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 256, in to_parquet
impl = get_engine(engine)
File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 40, in get_engine
return FastParquetImpl()
File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 180, in __init__
import fastparquet
File "/volumes/dependencies/site-packages/fastparquet/__init__.py", line 8, in <module>
from .core import read_thrift
File "/volumes/dependencies/site-packages/fastparquet/core.py", line 13, in <module>
from . import encoding
File "/volumes/dependencies/site-packages/fastparquet/encoding.py", line 11, in <module>
from .speedups import unpack_byte_array
File "__init__.pxd", line 861, in init fastparquet.speedups
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216
I read on other answers that this message shows up when pandas is compiled against a newer numpy version than the one you have installed. But updating both pandas and numpy did not work for me. I tried to find out if I have a few versions of numpy, but pip show numpy
seems to show the latest version.
Also, in a weird way, this happens only when I deploy locally and not on the server.
Any ideas how to go about fixing that? Or at least how to debug my numpy and pandas versions (if there are multiple versions how do I check that)
I tried: upgrading both packages and removing and reinstalling them. No help there.
TLDR: If docker add:
RUN pip install numpy
before you install pandas (probably just your pip install -r requirements.txt) and it will just work again.
I am doing this in docker building pandas in alpine and run into the same issue and it JUST popped up (Dec 27th ish 2018) for a build that's been working just fine previously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With