Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unpickling dictionary that holds pandas dataframes throws AttributeError: 'Dataframe' object has no attribute '_data'

I have a class that performs analyses and attaches the results, which are pandas dataframes, as object attributes:

>>> print(test.image.locate_DF)
              y          x       mass  ...    raw_mass        ep  frame
0     60.177142  59.788709  33.433414  ...  242.080256       NaN      0
1     60.651991  59.773904  33.724308  ...  242.355784       NaN      1
2     60.790437  60.190234  31.117164  ...  236.276671       NaN      2
3     60.771933  60.048123  33.558372  ...  240.981395       NaN      3
4     60.251282  59.775139  31.881009  ...  239.239022       NaN      4
...         ...        ...        ...  ...         ...       ...    ...
7212  68.186380  76.477449  18.122817  ...  176.523091       NaN   9410
7213  68.764444  76.574091  17.486454  ...  173.448306       NaN   9415
7214  68.191152  76.473477  17.402975  ...  172.848119  0.868326   9429
7215  67.034103  76.025885  17.010951  ...  170.928067 -0.600854   9431
7216  68.583276  75.309592  17.852992  ...  178.271558       NaN   9432

Subsequently, I save all the important object attributes in a dictionary, and pickle it for later use:

def save_parameters(self, filepath):
        
        param_dict = {}

    try:
            self.image.locate_DF
        except AttributeError:
            pass
        else:
            param_dict['optical_locate_DF'] = self.image.locate_DF

    with open(filepath, 'wb') as handle:
            pickle.dump(param_dict, handle, 5)

When trying to load that pickled file, I have no problem at all, the dataframe loads perfectly:

>>> test.save_parameters('test.pickle')
>>> with open('test.pickle', 'rb') as handle:
...     result = pickle.load(handle)
...
>>> print(result.keys())
dict_keys(['optical_path', 'optical_feature_diameter', 'optical_feature_minmass', 'optical_locate_DF', 'electrical_path', 'electrical_raw_data', 'electrical_processed_data', 'electrical_mean_voltage'])
>>> print(result['optical_locate_DF'])
              y          x       mass  ...    raw_mass        ep  frame
0     60.177142  59.788709  33.433414  ...  242.080256       NaN      0
1     60.651991  59.773904  33.724308  ...  242.355784       NaN      1
2     60.790437  60.190234  31.117164  ...  236.276671       NaN      2
3     60.771933  60.048123  33.558372  ...  240.981395       NaN      3
4     60.251282  59.775139  31.881009  ...  239.239022       NaN      4
...         ...        ...        ...  ...         ...       ...    ...
7212  68.186380  76.477449  18.122817  ...  176.523091       NaN   9410
7213  68.764444  76.574091  17.486454  ...  173.448306       NaN   9415
7214  68.191152  76.473477  17.402975  ...  172.848119  0.868326   9429
7215  67.034103  76.025885  17.010951  ...  170.928067 -0.600854   9431
7216  68.583276  75.309592  17.852992  ...  178.271558       NaN   9432

[7217 rows x 9 columns]

However, after running my analysis on a bunch of these files on a hpc, and then trying to open that same pickled file (it's named differently now but it's the same file as shown above, with the same analysis performed on it), I get thrown an attribute error by pandas. It states that the dataframe has no '_data' attribute. The dictionary has the same keys and the keys that are not a dataframe are printed without any issues:

>>> resultfile = '../results/diam_15_minmass_17_dist_50_mem_5000_tracklength_500/R9_DNA_50mV_001.pickle'
>>> with open(resultfile, 'rb') as handle:
...     result = pickle.load(handle)
...
>>> print(result.keys())
dict_keys(['optical_path', 'optical_feature_diameter', 'optical_feature_minmass', 'optical_locate_DF', 'optical_tracking_distance', 'optical_tracking_memory', 'optical_tracking_DF', 'optical_kinetics_DF', 'electrical_path', 'electrical_raw_data', 'electrical_processed_data', 'electrical_mean_voltage'])
>>> print(result['optical_locate_DF'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/frame.py", line 680, in __repr__
    self.to_string(
  File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/frame.py", line 801, in to_string
    formatter = fmt.DataFrameFormatter(
  File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/io/formats/format.py", line 593, in __init__
    self.max_rows_displayed = min(max_rows or len(self.frame), len(self.frame))
  File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/frame.py", line 1041, in __len__
    return len(self.index)
  File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
  File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
  File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'

I've looked into the pickle manual, and through a bunch of SO questions, but I can't seem to find out what is going wrong here. Does anyone have an idea how to fix this, and also whether I can still access that data?

like image 886
Steven Avatar asked Aug 24 '20 10:08

Steven


People also ask

How do you solve a DataFrame object has no attribute?

Fix error while creating the dataframe If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument. Since the dictionary has a key, value pairs we can pass it as an argument.

How to define a DataFrame?

What is a DataFrame? A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

What is a pandas DataFrame?

DataFrame. DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.


3 Answers

I had the same problem. I generated a Pandas dataframe in an environment with Pandas 1.1.1 and saved it to a pickle file.

with open('file.pkl', 'wb') as f:
    pickle.dump(data_frame_object, f)

After unpickling it in another session and printing the dataframe I got the same error. Some testing in different environments showed the following pattern:

  • environment with Pandas >= 1.1.0: works
  • environment with Pandas == 1.0.5: error message as above
  • environment with Pandas == 1.0.3: Kernel crashes

I got the same error using the HDF5 format so it seems to be a compatibility issue with the dataframe and different Pandas versions.

Updating Pandas to 1.1.1 in the affected environments solved the issue for me.

like image 91
BodoB Avatar answered Oct 11 '22 06:10

BodoB


After a long and painful process of cross-checking module versions, I found out that this error was caused due to an update in the pandas version. My mac still ran pandas 1.0.5, whereas the hpc runs pandas 1.1.0. Apparently, there is a mismatch between the two (unsure whether it's just after pickling or also for other file formats used to save).

like image 23
Steven Avatar answered Oct 11 '22 05:10

Steven


Maybe the problem has been solved.
Emmm, but I still want to add some comments.

I save the pkl file on the server, but when I load it on my MAC, it crashed, showing 'Dataframe' object has no attribute '_data'

Finally, I found that pandas on my Mac is 1.0.5 but 1.1.5 on the server. When I updated it to the latest, it just worked.

like image 37
LimingFang Avatar answered Oct 11 '22 07:10

LimingFang