I have a series of images, that are stored in a CVS file as one string per image, the string is a list of 9216 space separated integers. I have a function that converts this to a 96x96 numpy array.
I wish to store this numpy array in a column of my dataframe instead of the string.
However when i retrieve the item from the column it is no longer usable as a numpy array.
Data can be dowloaded from here, the last column in the training.cvs file.
https://www.kaggle.com/c/facial-keypoints-detection/data
import pandas as pd
import numpy as np
df_train = pandas.read_csv("training.csv")
def convert_to_np_arr(im_as_str):
im = [int(i) for i in im_as_str.split()]
im = np.asarray(im)
im = im.reshape((96, 96))
return im
df_train['Im_as_np'] = df_train.Image.apply(convert_to_np_arr)
im = df_train.Im_as_np[0]
plt.imshow(im, cmap = cm.Greys_r)
plt.show()
If instead of using the function and applying and storing the image, I use the code directly it works as expected
import pandas as pd
import numpy as np
df_train = pandas.read_csv("training.csv")
im = df_train.Image[0]
im = [int(i) for i in im.split()]
im = np.asarray(im)
im = im.reshape((96, 96))
plt.imshow(im, cmap = cm.Greys_r)
plt.show()
Pandas does not tend to be a suitable data structure for handling images. Generally, the assumption with Pandas is that the number of columns is much less than the number of rows. This of course doesn't need to be true, and for DataFrames that are small in both dimensions, it rarely matters. But for mathematical operations that are natural in a spatial sense, the relational structure of the DataFrame is not appropriate, and this shows as the number of columns grows. Given this, I would suggest just using NumPy's csv-reading abilities and working with it as a 2d array or an image object, with e.g. scikits.image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With