I have a data analysis pipeline that consists of multiple steps. I have generated a snakemake pipeline (new for me) and the output of every task (and input of the next task) is a pickle file containing either a DataFrame or a list of DataFrames. Everything is fine except I cannot open the pickle files manually. Of note, the pipeline uses a dedicated conda environment.
import _pickle
with open("testb/first/out/stacks.pkl", "rb") as f:
data = _pickle.load(f)
I get this error:
AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from 'C:\\Users\\sebde\\anaconda3\\envs\\dbm\\lib\\site-packages\\pandas\\_libs\\internals.cp39-win_amd64.pyd'
Python 3.10.2, Snakemake-minimal 7.0.4 (as per documentation, I'm on Windows), Pandas 1.4.1
Maybe, you might be better match the version of pandas between when saving the file and when loading the file.
I also got similar error AttributeError: Can't get attribute '_unpickle_block' like this.
import joblib
import pandas as pd # 1.4.3
df = pd.DataFrame({"a": 1, "b": 2})
joblib.dump(df, "train.dump", compress=True)
# --------------------------------
# When loading this, pandas 1.3.5 is installed.
import joblib
def load(path):
return joblib.load(path)
train_data = load("train.dump")
Traceback (most recent call last):
File "/home/ec2-user/work/tools/script.py", line 45, in <module>
train_data = load("train.dump")
File "/home/ec2-user/work/tools/script.py", line 16, in load
return joblib.load(path)
File "/home/ec2-user/work/.venv/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/work/.venv/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1538, in load_stack_global
self.append(self.find_class(module, name))
File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1582, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from '/home/ec2-user/work/.venv/lib/python3.10/site-packages/pandas/_libs/internals.cpython-310-x86_64-linux-gnu.so'>
I was using pandas 1.3.5 when I loaded the file.
But actually, I saved a DataFrame made with pandas 1.4.3, in the file "train.dump".
So I retried to exec joblib.load after re-installing pandas 1.4.3.
Consequently, I could load the file "train.dump"!
Please, refer to another advice. AttributeError: Can't get attribute '_unpickle_block'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With