How to convert a list of strings into a tensor in pytorch?

Tags:

I am working on classification problem in which I have a list of strings as class labels and I want to convert them into a tensor. So far I have tried converting the list of strings into a numpy array using the np.array function provided by the numpy module.

truth = torch.from_numpy(np.array(truths))

but I am getting the following error.

RuntimeError: can't convert a given np.ndarray to a tensor - it has an invalid type. The only supported types are: double, float, int64, int32, and uint8.

Can anybody suggest an alternative approach? Thanks

406

asked Jun 18 '17 17:06

deepayan das

2 Answers

Unfortunately, you can't right now. And I don't think it is a good idea since it will make PyTorch clumsy. A popular workaround could convert it into numeric types using sklearn.

Here is a short example:

from sklearn import preprocessing
import torch

labels = ['cat', 'dog', 'mouse', 'elephant', 'pandas']
le = preprocessing.LabelEncoder()
targets = le.fit_transform(labels)
# targets: array([0, 1, 2, 3])

targets = torch.as_tensor(targets)
# targets: tensor([0, 1, 2, 3])

Since you may need the conversion between true labels and transformed labels, it is good to store the variable le.

173

answered Sep 21 '22 20:09

Tengerye

The trick is first to find out max length of a word in the list, and then at the second loop populate the tensor with zeros padding. Note that utf8 strings take two bytes per char.

In[]
import torch

words = ['שלום', 'beautiful', 'world']
max_l = 0
ts_list = []
for w in words:
    ts_list.append(torch.ByteTensor(list(bytes(w, 'utf8'))))
    max_l = max(ts_list[-1].size()[0], max_l)

w_t = torch.zeros((len(ts_list), max_l), dtype=torch.uint8)
for i, ts in enumerate(ts_list):
    w_t[i, 0:ts.size()[0]] = ts
w_t

Out[]
tensor([[215, 169, 215, 156, 215, 149, 215, 157,   0],
        [ 98, 101,  97, 117, 116, 105, 102, 117, 108],
        [119, 111, 114, 108, 100,   0,   0,   0,   0]], dtype=torch.uint8)

answered Sep 21 '22 20:09

Serge Tochilov

Related questions
                            
                                Python List - "reserving" space ( ~ resizing)
                            
                                python operator, no operator for "not in"
                            
                                Python PEP 8 docstring line length [closed]
                            
                                Django: Retrieving IDs of manyToMany fields quickly
                            
                                Why python designed as str(None) return 'None' instead of an empty string?
                            
                                Python Gzip - Appending to file on the fly
                            
                                How to access python package metadata from within the python console?
                            
                                Celery task state always pending
                            
                                How to select range in Pandas using a row
                            
                                Is it more memory-efficient to set variables to `None` in python?
                            
                                Should I check if an item is already in a set before adding it?
                            
                                Can someone explain how the source code of staticmethod works in python
                            
                                'frozenset' object is not callable
                            
                                Python: optimal search for substring in list of strings
                            
                                ProcessPoolExecutor and Lock in Python
                            
                                pyyaml and using quotes for strings only
                            
                                TensorFlow strings: what they are and how to work with them
                            
                                How do I use python as a server-side language?
                            
                                In-place sort_values in pandas what does it exactly mean?
                            
                                How to test functions cdef'd in Cython?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a list of strings into a tensor in pytorch?

Tags:

python

numpy

pytorch

deepayan das

People also ask

2 Answers

Tengerye

Serge Tochilov

Recent Activity

Donate For Us