I am working on classification problem in which I have a list of strings as class labels and I want to convert them into a tensor. So far I have tried converting the list of strings into a numpy array
using the np.array
function provided by the numpy module.
truth = torch.from_numpy(np.array(truths))
but I am getting the following error.
RuntimeError: can't convert a given np.ndarray to a tensor - it has an invalid type. The only supported types are: double, float, int64, int32, and uint8.
Can anybody suggest an alternative approach? Thanks
To convert a Python list to a tensor, we are going to use the tf. convert_to_tensor() function and this function will help the user to convert the given object into a tensor. In this example, the object can be a Python list and by using the function will return a tensor.
as_tensor. Converts data into a tensor, sharing data and preserving autograd history if possible.
a NumPy array is created by using the np. array() method. The NumPy array is converted to tensor by using tf. convert_to_tensor() method.
Unfortunately, you can't right now. And I don't think it is a good idea since it will make PyTorch clumsy. A popular workaround could convert it into numeric types using sklearn.
Here is a short example:
from sklearn import preprocessing
import torch
labels = ['cat', 'dog', 'mouse', 'elephant', 'pandas']
le = preprocessing.LabelEncoder()
targets = le.fit_transform(labels)
# targets: array([0, 1, 2, 3])
targets = torch.as_tensor(targets)
# targets: tensor([0, 1, 2, 3])
Since you may need the conversion between true labels and transformed labels, it is good to store the variable le
.
The trick is first to find out max length of a word in the list, and then at the second loop populate the tensor with zeros padding. Note that utf8 strings take two bytes per char.
In[]
import torch
words = ['שלום', 'beautiful', 'world']
max_l = 0
ts_list = []
for w in words:
ts_list.append(torch.ByteTensor(list(bytes(w, 'utf8'))))
max_l = max(ts_list[-1].size()[0], max_l)
w_t = torch.zeros((len(ts_list), max_l), dtype=torch.uint8)
for i, ts in enumerate(ts_list):
w_t[i, 0:ts.size()[0]] = ts
w_t
Out[]
tensor([[215, 169, 215, 156, 215, 149, 215, 157, 0],
[ 98, 101, 97, 117, 116, 105, 102, 117, 108],
[119, 111, 114, 108, 100, 0, 0, 0, 0]], dtype=torch.uint8)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With