I have a class that performs analyses and attaches the results, which are pandas dataframes, as object attributes:
>>> print(test.image.locate_DF)
y x mass ... raw_mass ep frame
0 60.177142 59.788709 33.433414 ... 242.080256 NaN 0
1 60.651991 59.773904 33.724308 ... 242.355784 NaN 1
2 60.790437 60.190234 31.117164 ... 236.276671 NaN 2
3 60.771933 60.048123 33.558372 ... 240.981395 NaN 3
4 60.251282 59.775139 31.881009 ... 239.239022 NaN 4
... ... ... ... ... ... ... ...
7212 68.186380 76.477449 18.122817 ... 176.523091 NaN 9410
7213 68.764444 76.574091 17.486454 ... 173.448306 NaN 9415
7214 68.191152 76.473477 17.402975 ... 172.848119 0.868326 9429
7215 67.034103 76.025885 17.010951 ... 170.928067 -0.600854 9431
7216 68.583276 75.309592 17.852992 ... 178.271558 NaN 9432
Subsequently, I save all the important object attributes in a dictionary, and pickle it for later use:
def save_parameters(self, filepath):
param_dict = {}
try:
self.image.locate_DF
except AttributeError:
pass
else:
param_dict['optical_locate_DF'] = self.image.locate_DF
with open(filepath, 'wb') as handle:
pickle.dump(param_dict, handle, 5)
When trying to load that pickled file, I have no problem at all, the dataframe loads perfectly:
>>> test.save_parameters('test.pickle')
>>> with open('test.pickle', 'rb') as handle:
... result = pickle.load(handle)
...
>>> print(result.keys())
dict_keys(['optical_path', 'optical_feature_diameter', 'optical_feature_minmass', 'optical_locate_DF', 'electrical_path', 'electrical_raw_data', 'electrical_processed_data', 'electrical_mean_voltage'])
>>> print(result['optical_locate_DF'])
y x mass ... raw_mass ep frame
0 60.177142 59.788709 33.433414 ... 242.080256 NaN 0
1 60.651991 59.773904 33.724308 ... 242.355784 NaN 1
2 60.790437 60.190234 31.117164 ... 236.276671 NaN 2
3 60.771933 60.048123 33.558372 ... 240.981395 NaN 3
4 60.251282 59.775139 31.881009 ... 239.239022 NaN 4
... ... ... ... ... ... ... ...
7212 68.186380 76.477449 18.122817 ... 176.523091 NaN 9410
7213 68.764444 76.574091 17.486454 ... 173.448306 NaN 9415
7214 68.191152 76.473477 17.402975 ... 172.848119 0.868326 9429
7215 67.034103 76.025885 17.010951 ... 170.928067 -0.600854 9431
7216 68.583276 75.309592 17.852992 ... 178.271558 NaN 9432
[7217 rows x 9 columns]
However, after running my analysis on a bunch of these files on a hpc, and then trying to open that same pickled file (it's named differently now but it's the same file as shown above, with the same analysis performed on it), I get thrown an attribute error by pandas. It states that the dataframe has no '_data' attribute. The dictionary has the same keys and the keys that are not a dataframe are printed without any issues:
>>> resultfile = '../results/diam_15_minmass_17_dist_50_mem_5000_tracklength_500/R9_DNA_50mV_001.pickle'
>>> with open(resultfile, 'rb') as handle:
... result = pickle.load(handle)
...
>>> print(result.keys())
dict_keys(['optical_path', 'optical_feature_diameter', 'optical_feature_minmass', 'optical_locate_DF', 'optical_tracking_distance', 'optical_tracking_memory', 'optical_tracking_DF', 'optical_kinetics_DF', 'electrical_path', 'electrical_raw_data', 'electrical_processed_data', 'electrical_mean_voltage'])
>>> print(result['optical_locate_DF'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/frame.py", line 680, in __repr__
self.to_string(
File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/frame.py", line 801, in to_string
formatter = fmt.DataFrameFormatter(
File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/io/formats/format.py", line 593, in __init__
self.max_rows_displayed = min(max_rows or len(self.frame), len(self.frame))
File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/frame.py", line 1041, in __len__
return len(self.index)
File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
File "/Users/stevenvanuytsel/miniconda3/envs/simultaneous_measurements/lib/python3.8/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'
I've looked into the pickle manual, and through a bunch of SO questions, but I can't seem to find out what is going wrong here. Does anyone have an idea how to fix this, and also whether I can still access that data?
Fix error while creating the dataframe If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument. Since the dictionary has a key, value pairs we can pass it as an argument.
What is a DataFrame? A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
DataFrame. DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.
I had the same problem. I generated a Pandas dataframe in an environment with Pandas 1.1.1 and saved it to a pickle file.
with open('file.pkl', 'wb') as f:
pickle.dump(data_frame_object, f)
After unpickling it in another session and printing the dataframe I got the same error. Some testing in different environments showed the following pattern:
I got the same error using the HDF5 format so it seems to be a compatibility issue with the dataframe and different Pandas versions.
Updating Pandas to 1.1.1 in the affected environments solved the issue for me.
After a long and painful process of cross-checking module versions, I found out that this error was caused due to an update in the pandas version. My mac still ran pandas 1.0.5, whereas the hpc runs pandas 1.1.0. Apparently, there is a mismatch between the two (unsure whether it's just after pickling or also for other file formats used to save).
Maybe the problem has been solved.
Emmm, but I still want to add some comments.
I save the pkl file on the server, but when I load it on my MAC, it crashed, showing 'Dataframe' object has no attribute '_data'
Finally, I found that pandas on my Mac is 1.0.5 but 1.1.5 on the server. When I updated it to the latest, it just worked.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With