Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot open pickle file generated by snakemake pipeline

I have a data analysis pipeline that consists of multiple steps. I have generated a snakemake pipeline (new for me) and the output of every task (and input of the next task) is a pickle file containing either a DataFrame or a list of DataFrames. Everything is fine except I cannot open the pickle files manually. Of note, the pipeline uses a dedicated conda environment.

import _pickle
with open("testb/first/out/stacks.pkl", "rb") as f:
    data = _pickle.load(f)

I get this error:

AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from 'C:\\Users\\sebde\\anaconda3\\envs\\dbm\\lib\\site-packages\\pandas\\_libs\\internals.cp39-win_amd64.pyd'

Python 3.10.2, Snakemake-minimal 7.0.4 (as per documentation, I'm on Windows), Pandas 1.4.1

like image 684
SebDL Avatar asked Mar 05 '26 17:03

SebDL


1 Answers

Maybe, you might be better match the version of pandas between when saving the file and when loading the file.

I also got similar error AttributeError: Can't get attribute '_unpickle_block' like this.

import joblib
import pandas as pd  # 1.4.3


df = pd.DataFrame({"a": 1, "b": 2})
joblib.dump(df, "train.dump", compress=True)

# --------------------------------

# When loading this, pandas 1.3.5 is installed.
import joblib


def load(path):
    return joblib.load(path)


train_data = load("train.dump")
Traceback (most recent call last):
  File "/home/ec2-user/work/tools/script.py", line 45, in <module>
    train_data = load("train.dump")
  File "/home/ec2-user/work/tools/script.py", line 16, in load
    return joblib.load(path)
  File "/home/ec2-user/work/.venv/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/ec2-user/work/.venv/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1538, in load_stack_global
    self.append(self.find_class(module, name))
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1582, in find_class
    return _getattribute(sys.modules[module], name)[0]
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 331, in _getattribute
    raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from '/home/ec2-user/work/.venv/lib/python3.10/site-packages/pandas/_libs/internals.cpython-310-x86_64-linux-gnu.so'>

I was using pandas 1.3.5 when I loaded the file. But actually, I saved a DataFrame made with pandas 1.4.3, in the file "train.dump".

So I retried to exec joblib.load after re-installing pandas 1.4.3. Consequently, I could load the file "train.dump"!

Please, refer to another advice. AttributeError: Can't get attribute '_unpickle_block'

like image 127
siruku6 Avatar answered Mar 07 '26 06:03

siruku6



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!