Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deep learning Udacity course: Prob 2 assignment 1 (notMNIST)

After reading this and taking the courses, I am struggling to solve the second problem in assignment 1 (notMnist):

Let's verify that the data still looks good. Displaying a sample of the labels and images from the ndarray. Hint: you can use matplotlib.pyplot.

Here is what I tried:

import random
rand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)), 1)) ]
print(rand_smpl)
filename = rand_smpl[0]
import pickle
loaded_pickle = pickle.load( open( filename, "r" ) )
image_size = 28  # Pixel width and height.
import numpy as np
dataset = np.ndarray(shape=(len(loaded_pickle), image_size, image_size),
                         dtype=np.float32)
import matplotlib.pyplot as plt

plt.plot(dataset[2])
plt.ylabel('some numbers')
plt.show()

but this is what I get:

enter image description here

which doesn't make much sense. To be honest my code may too, since I am not really sure how to tackle that problem!


The pickles are created like this:

image_size = 28  # Pixel width and height.
pixel_depth = 255.0  # Number of levels per pixel.

def load_letter(folder, min_num_images):
  """Load the data for a single letter label."""
  image_files = os.listdir(folder)
  dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
                         dtype=np.float32)
  print(folder)
  num_images = 0
  for image in image_files:
    image_file = os.path.join(folder, image)
    try:
      image_data = (ndimage.imread(image_file).astype(float) - 
                    pixel_depth / 2) / pixel_depth
      if image_data.shape != (image_size, image_size):
        raise Exception('Unexpected image shape: %s' % str(image_data.shape))
      dataset[num_images, :, :] = image_data
      num_images = num_images + 1
    except IOError as e:
      print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')
    
  dataset = dataset[0:num_images, :, :]
  if num_images < min_num_images:
    raise Exception('Many fewer images than expected: %d < %d' %
                    (num_images, min_num_images))
    
  print('Full dataset tensor:', dataset.shape)
  print('Mean:', np.mean(dataset))
  print('Standard deviation:', np.std(dataset))
  return dataset

where that function is called like this:

  dataset = load_letter(folder, min_num_images_per_class)
  try:
    with open(set_filename, 'wb') as f:
      pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)

The idea here is:

Now let's load the data in a more manageable format. Since, depending on your computer setup you might not be able to fit it all in memory, we'll load each class into a separate dataset, store them on disk and curate them independently. Later we'll merge them into a single dataset of manageable size.

We'll convert the entire dataset into a 3D array (image index, x, y) of floating point values, normalized to have approximately zero mean and standard deviation ~0.5 to make training easier down the road.

like image 881
gsamaras Avatar asked Jul 04 '16 16:07

gsamaras


1 Answers

Do this as below:

#define a function to conver label to letter
def letter(i):
    return 'abcdefghij'[i]


# you need a matplotlib inline to be able to show images in python notebook
%matplotlib inline
#some random number in range 0 - length of dataset
sample_idx = np.random.randint(0, len(train_dataset))
#now we show it
plt.imshow(train_dataset[sample_idx])
plt.title("Char " + letter(train_labels[sample_idx]))

Your code changed the type of dataset actually, it is not an ndarray of size (220000, 28,28)

In general, pickle is a file which holds some objects, not the array itself. You should use the object from pickle directly to get your train dataset (using the notation from your code snippet):

#will give you train_dataset and labels
train_dataset = loaded_pickle['train_dataset']
train_labels = loaded_pickle['train_labels']

UPDATED:

Per request from @gsarmas the link to my solution for whole Assignment1 lies here.

The code is commented and mostly self-explanatory, but in case of any questions feel free to contact via any way you prefer on github

like image 170
Maksim Khaitovich Avatar answered Sep 30 '22 13:09

Maksim Khaitovich