I am trying to train a fasttext classifier in windows using fasttext python package. I have a utf8 file with lines like
__label__type1 sample sentence 1
__label__type2 sample sentence 2
__label__type1 sample sentence 3
When I run
fasttext.supervised('data.train.txt','model', label_prefix='__label__', dim=300, epoch=50, min_count=1, ws=3, minn=4, pretrained_vectors='wiki.simple.vec')
I got the following error
File "fasttext\fasttext.pyx", line 256, in fasttext.fasttext.supervised (fasttext/fasttext.cpp:7265)
File "fasttext\fasttext.pyx", line 182, in fasttext.fasttext.train_wrapper (fasttext/fasttext.cpp:5279)
ValueError: fastText: cannot load data.train.txt
And when I check the file types in my directory, I got
__pycache__: directory
data.train.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators
train.py: Python script, ASCII text executable, with CRLF line terminators
wiki.simple.vec: UTF-8 Unicode text, with very long lines, with CRLF line terminators
Also, when I try to train the same the classifier with the same training file in MacOs it works fine. I am trying to understand why that txt file cannot be read.
Thanks!
TL;DR: Use the os module to safely construct paths, especially in Python 2
The error indicated that the file can't be loaded. Since the only difference between your environments is the operating system, then the clue is that you're not properly locating the file, because each OS handles paths differently. I feel this is a mistake most python programmers make at least once, because it's unexpected.
You can hardcode paths, but then you'll have a problem down the road if you ever use things cross platform. In my case, sometimes I develop something quickly in Windows, but then deploy large scale on a *nix platform.
I suggest instead getting used to using the os module, because it will work across platforms. said in a comment that they had a path of "myfolder\nfolder\tfolder"; by trying to construct their own strings for a path instead of using the os module.. on windows even if the folder's didn't start with the newline \n and the tab \t it still wouldn't have worked, because windows paths need to escape the slash (\). Use os, and you don't have to know that.
>>> import os
>>> os.getcwd()
'C:\\Python27'
>>> os.path.abspath(os.sep)
'C:\\'
>>> os.chdir(os.path.join(os.path.abspath(os.sep, "Users", "Jeff"))
>>> os.getcwd()
'C:\\Users\\Jeff'
Usually, you'll be using relative paths from your project root, not absolute paths. Those are easier, the root of the current OS is what's a little trickier (you can find that answer here)
(I'm providing the full answer as we figured out from the comments)
Edit: Maybe python 3 has something this link says is better than os, pathlib. I've never used python 3 so I can't say.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With