Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Image regression with CNN

My immediate problem is that all of the various CNN regression models I've tried always return the same (or very similar) values and I'm trying to figure out why. But I would be open to a wide range of suggestions.

My dataset looks like this:

  • x: 64x64 greyscale images arranged into a 64 x 64 x n ndarray
  • y: Values between 0 and 1, each corresponding to an image (think of this as some sort of proportion)
  • weather: 4 weather readings from the time each image was taken (ambient temperature, humidity, dewpoint, air pressure)

The goal is to use the images and weather data to predict y. Since I'm working with images, I thought a CNN would be appropriate (please let me know if there are other strategies here).

From what I understand, CNNs are most often used for classification tasks--it's rather unusual to use them for regression. But in theory, it shouldn't be too different--I just need to change the loss function to MSE/RMSE and the last activation function to linear (although maybe a sigmoid is more appropriate here since y is between 0 and 1).

The first hurdle I ran into was trying to figure out how to incorporate the weather data, and the natural choice was to incorporate them into the first fully connected layer. I found an example here: How to train mix of image and data in CNN using ImageAugmentation in TFlearn

The second hurdle I ran into was determining an architecture. Normally I would just pick a paper and copy its architecture, but I couldn't find anything on CNN image regression. So I tried a (fairly simple) network with 3 convolutional layers and 2 fully connected layers, then I tried VGGNet and AlexNet architectures from https://github.com/tflearn/tflearn/tree/master/examples

Now the problem I'm having is that all of the models I'm trying output the same value, namely the mean y of the training set. Looking at tensorboard, the loss function flattens out fairly quickly (after around 25 epochs). Do you know what's going on here? While I do understand the basics of what each layer is doing, I have no intuition on what makes a good architecture for a particular dataset or task.

Here is an example. I am using VGGNet from the tflearn examples page:

tf.reset_default_graph()

img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_flip_updown()
img_aug.add_random_90degrees_rotation(rotations=[0, 1, 2, 3])

convnet = input_data(shape=[None, size, size, 1], 
                     data_augmentation=img_aug, 
                     name='hive')
weathernet = input_data(shape=[None, 4], name='weather')

convnet = conv_2d(convnet, 64, 3, activation='relu', scope='conv1_1')
convnet = conv_2d(convnet, 64, 3, activation='relu', scope='conv1_2')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool1')

convnet = conv_2d(convnet, 128, 3, activation='relu', scope='conv2_1')
convnet = conv_2d(convnet, 128, 3, activation='relu', scope='conv2_2')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool2')

convnet = conv_2d(convnet, 256, 3, activation='relu', scope='conv3_1')
convnet = conv_2d(convnet, 256, 3, activation='relu', scope='conv3_2')
convnet = conv_2d(convnet, 256, 3, activation='relu', scope='conv3_3')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool3')

convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv4_1')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv4_2')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv4_3')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool4')

convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv5_1')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv5_2')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv5_3')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool5')

convnet = fully_connected(convnet, 4096, activation='relu', scope='fc6')
convnet = merge([convnet, weathernet], 'concat')
convnet = dropout(convnet, .75, name='dropout1')

convnet = fully_connected(convnet, 4096, activation='relu', scope='fc7')
convnet = dropout(convnet, .75, name='dropout2')

convnet = fully_connected(convnet, 1, activation='sigmoid', scope='fc8')

convnet = regression(convnet, 
                     optimizer='adam', 
                     learning_rate=learning_rate, 
                     loss='mean_square', 
                     name='targets')

model = tflearn.DNN(convnet, 
                    tensorboard_dir='log', 
                    tensorboard_verbose=0)

model.fit({
            'hive': x_train,
            'weather': weather_train  
          },
          {'targets': y_train}, 
          n_epoch=1000, 
          batch_size=batch_size,
          validation_set=({
              'hive': x_val,
              'weather': weather_val
          }, 
                          {'targets': y_val}), 
          show_metric=False, 
          shuffle=True,
          run_id='poop')

To get at what my objects are:

  • x_train is an ndarray of shape (n, 64, 64, 1)
  • weather_train is an ndarray of shape (n, 4)
  • y_train is an ndarray of shape (n, 1)

Overfitting is another concern, but given that the models perform poorly on the training set, I think I can worry about that later.

like image 461
johnkoo Avatar asked May 14 '26 18:05

johnkoo


1 Answers

To address your concern regarding the same predicted value for all instances in your test set. You have a couple options here that don't involve changing the structure of your conv net:

  1. You can rescale your target variable using sklearn StandardScaler() (which standardizes features by removing the mean and scaling to unit variance)
  2. Scale pixel data; generally performance increases with scaled pixel data, as a rule of thumb always divide pixel data by 255.0 (shown at end of post)
  3. You can play around with learning rate and the error function (the reason that the CNN is outputting the same value for all predictions is because that is what it has determined is the point of minimum error)

Next. If you are trying to perform regression ensure that your final fully connected layer uses a linear activation function instead of sigmoid. A linear activation function takes inputs to the neuron multiplied by the neuron weight and creates an output proportional to the input.

convnet = fully_connected(convnet, 1, activation='linear', scope='fc8')

Lastly. I have recently implemented ResNet50 for regression tasks in Keras. Here is the construction of that network, this version does not permit loading of pretrained weights and it must receive images of shape (224, 224, 3).

from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, MaxPooling2D, DepthwiseConv2D
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, Input, Add, ZeroPadding2D, GlobalAveragePooling2D, GlobalMaxPooling2D
from keras.models import Model
from keras import backend


def block1(x, filters, kernel_size=3, stride=1, conv_shortcut=True, name=None):
    """
    A residual block

    :param x: input tensor
    :param filters: integer, filters of the bottleneck layer
    :param kernel_size: kernel size of bottleneck
    :param stride: stride of first layer
    :param conv_shortcut: use convolution shortcut if true, otherwise identity shortcut
    :param name: string, block label
    :return: Output tensor of the residual block

    """

    # bn_axis = 3 if backend.image_data_format() == 'channels_last' else 1

    bn_axis = -1

    if conv_shortcut is True:
        shortcut = Conv2D(4 * filters, 1, strides=stride, name=name+'_0_conv')(x)
        shortcut = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_0_bn')(shortcut)
    else:
        shortcut = x

    x = Conv2D(filters, 1, strides=stride, name=name+'_1_conv')(x)
    x = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_1_bn')(x)
    x = Activation('relu', name=name+'_1_relu')(x)

    x = Conv2D(filters, kernel_size, padding='SAME', name=name+'_2_conv')(x)
    x = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_2_bn')(x)
    x = Activation('relu', name=name+'_2_relu')(x)

    x = Conv2D(4 * filters, 1, name=name+'_3_conv')(x)
    x = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_3_bn')(x)

    x = Add(name=name+'_add')([shortcut, x])
    x = Activation('relu', name=name+'_out')(x)

    return x


def stack1(x, filters, blocks, stride1=2, name=None):
    """
    a set of stacked residual blocks

    :param x: input tensor
    :param filters: int, filters fof the bottleneck layer in the block
    :param blocks: int, blocks in the stacked blocks,
    :param stride1: stride of the first layer in the first block
    :param name: stack label
    :return: output tensor for the stacked blocks

    """

    x = block1(x, filters, stride=stride1, name=name+'_block1')

    for i in range(2, blocks+1):
        x = block1(x, filters, conv_shortcut=False, name=name+'_block'+str(i))

    return x

def resnet(height, width, depth, stack_fn, use_bias=False, nodes=256):
    """
    :param height: height of image, int
    :param width: image width, int
    :param depth: bn_axis or depth, int
    :param stack_fn: function that stacks residual blocks
    :param nodes: width of nodes included in top layer of CNN, int
    :return: a Keras model instance
    """

    input_shape = (height, width, depth)

    img_input = Input(shape=input_shape)

    x = ZeroPadding2D(padding=((3, 3), (3, 3)), name='conv1_pad')(img_input)
    x = Conv2D(64, 7, strides=2, use_bias=use_bias, name='conv1_conv')(x)

    x = ZeroPadding2D(padding=((1, 1), (1, 1)), name='pool1_pad')(x)
    x = MaxPooling2D(3, strides=2, name='pool1_pool')(x)

    x = stack_fn(x)

    # top layer
    x = GlobalAveragePooling2D(name='avg_pool')(x)
    x = Dense(nodes, activation='relu')(x)

    # perform regression
    x = Dense(1, activation='linear')(x)

    model = Model(img_input, x)

    return model


def resnet50(height, width, depth, nodes):

    def stack_fn(x):
        x = stack1(x, 64, 3, stride1=1, name='conv2')
        x = stack1(x, 128, 4, name='conv3')
        x = stack1(x, 256, 6, name='conv4')
        x = stack1(x, 512, 3, name='conv5')
        return x

    return resnet(height, width, depth, stack_fn, nodes=nodes)

Which can be implemented using some x_train, x_test, y_train, y_test data (where x_train/test is image data and y_train,y_test data are numeric values on the interval [0, 1].

scaler = MinMaxScaler()
images = load_images(df=target, path=PATH_features, resize_shape=(224, 224), quadruple=True)
images = images / 255.0  # scale pixel data to [0, 1]
images = images.astype(np.float32)
imshape = images.shape

target = target[Target]
target = quadruple_target(target, target=Target)

x_train, x_test, y_train, y_test = train_test_split(images, target, test_size=0.3, random_state=101)

y_train = scaler.fit_transform(y_train)
y_test = scaler.transform(y_test)

model = resnet50(imshape[1], imshape[2], imshape[3], nodes=256)

opt = Adam(lr=1e-5, decay=1e-5 / 200)
model.compile(loss=lossFN, optimizer=opt)

history = model.fit(x_train, y_train, validation_data=(x_test, y_test), verbose=1, epochs=200)

pred = model.predict(x_test)
like image 82
Taylr Cawte Avatar answered May 16 '26 13:05

Taylr Cawte