Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert python sequence with multiple datatypes to tensor

I'm using TensorFlow r1.7 and python3.6.5. I am also very new to TensorFlow, so I'd like easy to read explanations if possible.

I'm trying to convert my input data into a dataset of tensors with this function tf.data.Dataset.from_tensor_slices(). I pass my tuple with mixed datatypes into this function. However, when running my code I get this error: ValueError: Can't convert Python sequence with mixed types to Tensor.

I want to know why I am receiving this error, and how I can convert my data to a dataset of tensors even with mixed datatypes.

Here's a printout of the top 5 entries in my tuple.

(13501, 2, None, 51, '2232', 'S35', '734.72', 'CLA', '240', 1035, 2060, 1252, 1182, 10, '967.28', '338.50', None, 14, 102, 3830)
(15124, 2, None, 57, '2641', 'S35', '234.80', 'DDA', '240', 743, 1597, 4706, 156, 0, None, None, None, 3, 27, 981)
(40035, 2, None, None, '21', 'K00', '60.06', 'CHK', '520', 76, 1863, 12, None, 1, '85.06', '25.00', None, 1, 5, 245)
(42331, 3, None, 62, '121', 'S50', '1859.01', 'ACT', '420', 952, 1583, 410, 255, 0, None, None, None, 6, 117, 1795)
(201721, 3, None, 42, '2472', 'S35', '1413.84', 'CLA', '350', 868, 1746, 963, 264, 0, None, None, None, 18, 65, 4510)

As you can see, I have a mix of integers, floats, and strings in my input data.

Here is a traceback of the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/miikey101/Documents/Khalen_Case_Loader/tensorflow/k_means/k_means.py", line 10, in prepare_dataset
    dataset = tf.data.Dataset.from_tensor_slices(dm_data)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 222, in from_tensor_slices
    return TensorSliceDataset(tensors)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in __init__
    for i, t in enumerate(nest.flatten(tensors))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in <listcomp>
    for i, t in enumerate(nest.flatten(tensors))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
    as_ref=False)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 185, in constant
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 131, in convert_to_eager_tensor
    return ops.EagerTensor(value, context=handle, device=device, dtype=dtype)
ValueError: Can't convert Python sequence with mixed types to Tensor.
like image 389
Michael Avatar asked Apr 13 '18 20:04

Michael


2 Answers

In tensorflow you can't have a tensor with more than one data type.

Quoting the documentation:

It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.

Hence a workaround could be to create a tensor with data type tf.String and, on the occurrence, cast the field to the desired data type

like image 174
nessuno Avatar answered Oct 12 '22 22:10

nessuno


You want a tensor for each of your features (columns). Only if it's a multi-dimensional feature (like an image, a video, list of strings, vector) would you have more dimensions in the tensor and even then they would all have the same datatype.

tf.data.Dataset.from_tensor_slices() will accept your input as a dictionary of lists (key is the name of the feature, value is a list of the values in that feature), or as a list of lists. I can't remember if it eats Pandas dataframes but if it doesn't you can easily convert it to a dictionary df.to_dict().

However, you can't input None values. You will have to find some value for those before converting into a tensor. Classic approaches to that is median value, zero value, most common value, "missing"/"unknown" value for strings or categories, or imputation.

like image 1
grofte Avatar answered Oct 13 '22 00:10

grofte