Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fasttext cannot load training txt file

I am trying to train a fasttext classifier in windows using fasttext python package. I have a utf8 file with lines like

__label__type1 sample sentence 1
__label__type2 sample sentence 2
__label__type1 sample sentence 3 

When I run

fasttext.supervised('data.train.txt','model', label_prefix='__label__', dim=300, epoch=50, min_count=1, ws=3, minn=4, pretrained_vectors='wiki.simple.vec')

I got the following error

File "fasttext\fasttext.pyx", line 256, in fasttext.fasttext.supervised (fasttext/fasttext.cpp:7265)
  File "fasttext\fasttext.pyx", line 182, in fasttext.fasttext.train_wrapper (fasttext/fasttext.cpp:5279)
ValueError: fastText: cannot load data.train.txt

And when I check the file types in my directory, I got

__pycache__:     directory
data.train.txt:  UTF-8 Unicode text, with very long lines, with CRLF line terminators
train.py:        Python script, ASCII text executable, with CRLF line terminators
wiki.simple.vec: UTF-8 Unicode text, with very long lines, with CRLF line terminators

Also, when I try to train the same the classifier with the same training file in MacOs it works fine. I am trying to understand why that txt file cannot be read.

Thanks!

like image 769
tahsintahsin Avatar asked Jun 18 '18 09:06

tahsintahsin


1 Answers

TL;DR: Use the os module to safely construct paths, especially in Python 2

The error indicated that the file can't be loaded. Since the only difference between your environments is the operating system, then the clue is that you're not properly locating the file, because each OS handles paths differently. I feel this is a mistake most python programmers make at least once, because it's unexpected.

You can hardcode paths, but then you'll have a problem down the road if you ever use things cross platform. In my case, sometimes I develop something quickly in Windows, but then deploy large scale on a *nix platform.

I suggest instead getting used to using the os module, because it will work across platforms. said in a comment that they had a path of "myfolder\nfolder\tfolder"; by trying to construct their own strings for a path instead of using the os module.. on windows even if the folder's didn't start with the newline \n and the tab \t it still wouldn't have worked, because windows paths need to escape the slash (\). Use os, and you don't have to know that.

>>> import os
>>> os.getcwd()
'C:\\Python27'
>>> os.path.abspath(os.sep)
'C:\\'
>>> os.chdir(os.path.join(os.path.abspath(os.sep, "Users", "Jeff"))
>>> os.getcwd()
'C:\\Users\\Jeff'

Usually, you'll be using relative paths from your project root, not absolute paths. Those are easier, the root of the current OS is what's a little trickier (you can find that answer here)

(I'm providing the full answer as we figured out from the comments)

Edit: Maybe python 3 has something this link says is better than os, pathlib. I've never used python 3 so I can't say.

like image 113
Jeff Ellen Avatar answered Oct 03 '22 13:10

Jeff Ellen