Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster K-Means Clustering in TensorFlow

Dear TensorFlow Community,

I'm training a classifier with tf.contrib.factorization.KMeansClustering, but the training goes really slow, and only uses 1% of my GPU.

However, my 4 CPU cores are hitting about 35% use constantly.

Is it the case that K-Means is written more for the CPU than the GPU?

Is there a way I can shift more of the computation to the GPU, or some other approach to speed up training?

Below is my script for training (Python3).

Thank you for your time.

import tensorflow as tf 



def parser(record):

  features={
    'feats': tf.FixedLenFeature([], tf.string),
  }

  parsed = tf.parse_single_example(record, features)
  feats = tf.convert_to_tensor(tf.decode_raw(parsed['feats'], tf.float64))

  return {'feats': feats}


def my_input_fn(tfrecords_path):

    dataset = (
        tf.data.TFRecordDataset(tfrecords_path)
        .map(parser)
        .batch(1024)
    )

    iterator = dataset.make_one_shot_iterator()
    batch_feats = iterator.get_next()

    return batch_feats


### SPEC FUNCTIONS ###

train_spec_kmeans = tf.estimator.TrainSpec(input_fn = lambda: my_input_fn('/home/ubuntu/train.tfrecords') , max_steps=10000)
eval_spec_kmeans = tf.estimator.EvalSpec(input_fn = lambda: my_input_fn('/home/ubuntu/eval.tfrecords') )



### INIT ESTIMATOR ###

KMeansEstimator = tf.contrib.factorization.KMeansClustering(
    num_clusters=500,
    feature_columns = [tf.feature_column.numeric_column(
        key='feats',
        dtype=tf.float64,
        shape=(377,),
    )],
    use_mini_batch=True)


### TRAIN & EVAL ###

tf.estimator.train_and_evaluate(KMeansEstimator, train_spec_kmeans, eval_spec_kmeans)

Best, Josh

like image 715
JR Meyer Avatar asked Jun 19 '18 11:06

JR Meyer


2 Answers

Here's my best answer so far with time information, building off of Eliethesaiyan's answer and link to docs.

My original Dataset codeblock and performance:

dataset = (
 tf.data.TFRecordDataset(tfrecords_path)
 .map(parse_fn)
 .batch(1024)
)

real    1m36.171s
user    2m57.756s
sys     0m42.304s

Eliethesaiyan's answer (prefetch + num_parallel_calls)

dataset = (
    tf.data.TFRecordDataset(tfrecords_path)
    .map(parse_fn,num_parallel_calls=multiprocessing.cpu_count())
    .batch(1024)
    .prefetch(1024)
   )

real  0m41.450s
user  1m33.120s
sys   0m18.772s

From the docs using map_and_batch + num_parallel_batches + prefetch:

dataset = (
    tf.data.TFRecordDataset(tfrecords_path)
    .apply(
       tf.contrib.data.map_and_batch(
          map_func=parse_fn,
          batch_size=1024,
          num_parallel_batches=multiprocessing.cpu_count()
        )
    )
    .prefetch(1024)
 )

real   0m32.855s
user   1m11.412s
sys    0m10.408s
like image 105
JR Meyer Avatar answered Oct 11 '22 14:10

JR Meyer


one of the thing that i saw that increases gpu and cpu usage, is using prefetch on the dataset.It keeps the dataset producer fetch the data while the model is also consuming the previous batch therefore maximizing resource usage. Also specifying the max of your cpu would speed up the process. I would restructure it this way

dataset = (
    tf.data.TFRecordDataset(tfrecords_path)
    .map(parser,num_parallel_calls=multiprocessing.cpu_count())
    .batch(1024)

)

dataset = dataset.prefetch(1024)

here is a nice guide of best practice when it comes to use TfRecords here

like image 2
Eliethesaiyan Avatar answered Oct 11 '22 13:10

Eliethesaiyan