Dear TensorFlow Community,
I'm training a classifier with tf.contrib.factorization.KMeansClustering
,
but the training goes really slow, and only uses 1% of my GPU.
However, my 4 CPU cores are hitting about 35% use constantly.
Is it the case that K-Means is written more for the CPU than the GPU?
Is there a way I can shift more of the computation to the GPU, or some other approach to speed up training?
Below is my script for training (Python3).
Thank you for your time.
import tensorflow as tf
def parser(record):
features={
'feats': tf.FixedLenFeature([], tf.string),
}
parsed = tf.parse_single_example(record, features)
feats = tf.convert_to_tensor(tf.decode_raw(parsed['feats'], tf.float64))
return {'feats': feats}
def my_input_fn(tfrecords_path):
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parser)
.batch(1024)
)
iterator = dataset.make_one_shot_iterator()
batch_feats = iterator.get_next()
return batch_feats
### SPEC FUNCTIONS ###
train_spec_kmeans = tf.estimator.TrainSpec(input_fn = lambda: my_input_fn('/home/ubuntu/train.tfrecords') , max_steps=10000)
eval_spec_kmeans = tf.estimator.EvalSpec(input_fn = lambda: my_input_fn('/home/ubuntu/eval.tfrecords') )
### INIT ESTIMATOR ###
KMeansEstimator = tf.contrib.factorization.KMeansClustering(
num_clusters=500,
feature_columns = [tf.feature_column.numeric_column(
key='feats',
dtype=tf.float64,
shape=(377,),
)],
use_mini_batch=True)
### TRAIN & EVAL ###
tf.estimator.train_and_evaluate(KMeansEstimator, train_spec_kmeans, eval_spec_kmeans)
Best, Josh
Here's my best answer so far with time
information, building off of Eliethesaiyan's answer and link to docs.
My original Dataset
codeblock and performance:
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parse_fn)
.batch(1024)
)
real 1m36.171s
user 2m57.756s
sys 0m42.304s
Eliethesaiyan's answer (prefetch
+ num_parallel_calls
)
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parse_fn,num_parallel_calls=multiprocessing.cpu_count())
.batch(1024)
.prefetch(1024)
)
real 0m41.450s
user 1m33.120s
sys 0m18.772s
From the docs using map_and_batch
+ num_parallel_batches
+ prefetch
:
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.apply(
tf.contrib.data.map_and_batch(
map_func=parse_fn,
batch_size=1024,
num_parallel_batches=multiprocessing.cpu_count()
)
)
.prefetch(1024)
)
real 0m32.855s
user 1m11.412s
sys 0m10.408s
one of the thing that i saw that increases gpu and cpu usage, is using prefetch on the dataset.It keeps the dataset producer fetch the data while the model is also consuming the previous batch therefore maximizing resource usage. Also specifying the max of your cpu would speed up the process. I would restructure it this way
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parser,num_parallel_calls=multiprocessing.cpu_count())
.batch(1024)
)
dataset = dataset.prefetch(1024)
here is a nice guide of best practice when it comes to use TfRecords here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With