Unicode in the standard TensorFlow format

Tags:

Following the documentation here, I am trying to create features from unicode strings. Here is what the feature creation method looks like,

Click to copy

def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

This will raise an exception,

Click to copy

  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 512, in init
    copy.extend(field_value)
  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/containers.py", line 275, in extend
    new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/type_checkers.py", line 108, in CheckValue
    raise TypeError(message)
TypeError: u'Gross' has type <type 'unicode'>, but expected one of: (<type 'str'>,)

Naturally if I wrap the value in a str, it fails on the first actual unicode character it encounters.

912

asked Aug 15 '16 19:08

Russell

1 Answers

BytesList definition is in feature.proto and it is of type repeated bytes, this means that you need to pass it something that's convertible to a list of byte sequences.

There's more than one way to turn unicode into list of bytes, hence ambiguity. You could do it manually instead. IE, to use UTF-8 encoding

Click to copy

value.encode("utf-8")

answered Oct 12 '22 18:10

Yaroslav Bulatov

Related questions
                            
                                python logging - message not showing up in child
                            
                                How to pass Variable from Python to VBA Sub
                            
                                Pandas.read_excel: Accessing the home directory
                            
                                python, shapely: How to determine if two polygons cross each other, while allowing their edges to overlap
                            
                                How to filter a pandas series with a datetime index on the quarter and year
                            
                                Adapting binary stacking example to multiclass
                            
                                What is the standard docstring for a django model metaclass?
                            
                                When/How does an anonymous file object close?
                            
                                Split a pandas column of dictionaries into multiple columns
                            
                                Returning a PDF from S3 in Flask
                            
                                Middleware in flask
                            
                                Vectorizing a Nested Loop
                            
                                Restricted set operations on python dictionary key views
                            
                                Formatted string literals in Python 3.6 with tuples
                            
                                Pandas Divide dataframe by index values
                            
                                pandas merge dataframes on closest timestamp
                            
                                How can I remove all non-alphanumeric characters from a string, except for '#', with regex?
                            
                                How many iterations a needed to train tensorflow with the entire MNIST data set (60000 images)?
                            
                                Changing numpy structured array dtype names and formats
                            
                                unordered_map<int, vector<float>> equivalent in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unicode in the standard TensorFlow format

Tags:

python

unicode

tensorflow

protocol-buffers

Russell

People also ask

1 Answers

Yaroslav Bulatov

Recent Activity

Donate For Us