Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Torchtext 0.7 shows Field is being deprecated. What is the alternative?

Looks like the previous paradigm of declaring Fields, Examples and using BucketIterator is deprecated and will move to legacy in 0.8. However, I don't seem to be able to find an example of the new paradigm for custom datasets (as in, not the ones included in torch.datasets) that doesn't use Field. Can anyone point me at an up-to-date example?

Reference for deprecation:

https://github.com/pytorch/text/releases

like image 804
Paco Avatar asked Aug 22 '20 18:08

Paco


People also ask

Is TorchText deprecated?

Torchtext 0.7 shows Field is being deprecated.

What is Torch text?

TorchText is a pytorch package that contains different data processing methods as well as popular NLP datasets. According to the official PyTorch documentation, torchtext has 4 main functionalities: data, datasets, vocab, and utils. Data is mainly used to create custom dataset class, batching samples etc.


2 Answers

It took me a little while to find the solution myself. The new paradigm is like so for prebuilt datasets:

from torchtext.experimental.datasets import AG_NEWS
train, test = AG_NEWS(ngrams=3)

or like so for custom built datasets:

from torch.utils.data import DataLoader
def collate_fn(batch):
    texts, labels = [], []
    for label, txt in batch:
        texts.append(txt)
        labels.append(label)
    return texts, labels
dataloader = DataLoader(train, batch_size=8, collate_fn=collate_fn)
for idx, (texts, labels) in enumerate(dataloader):
    print(idx, texts, labels)

I've copied the examples from the Source

like image 107
Steven Avatar answered Sep 28 '22 10:09

Steven


Browsing through torchtext's GitHub repo I stumbled over the README in the legacy directory, which is not documented in the official docs. The README links a GitHub issue that explains the rationale behind the change as well as a migration guide.

If you just want to keep your existing code running with torchtext 0.9.0, where the deprecated classes have been moved to the legacy module, you have to adjust your imports:

# from torchtext.data import Field, TabularDataset
from torchtext.legacy.data import Field, TabularDataset

Alternatively, you can import the whole torchtext.legacy module as torchtext as suggested by the README:

import torchtext.legacy as torchtext
like image 21
Tobias Uhmann Avatar answered Sep 28 '22 10:09

Tobias Uhmann