Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error using langdetect in python: "No features in text"

Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below,

from langdetect import detect 
import csv
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvinput:
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)

    all = []
    row = next(reader)
    row.append('Lang')
    all.append(row)

    for row in reader:
        row.append(detect(row[0]))
        all.append(row)

    writer.writerows(all)

But I am getting the error as LangDetectException: No features in text

The traceback is as follows

runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')
Traceback (most recent call last):

  File "<ipython-input-25-5f98f4f8be50>", line 1, in <module>
    runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')

  File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
    execfile(filename, namespace)

  File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/dell/.spyder2-py3/temp.py", line 21, in <module>
    row.append(detect(row[0]))

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector_factory.py", line 130, in detect
    return detector.detect()

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 136, in detect
    probabilities = self.get_probabilities()

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 143, in get_probabilities
    self._detect_block()

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 150, in _detect_block
    raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')

LangDetectException: No features in text.

This is how my csv looks like 1)skunkiest smokiest yummiest strain pain killer and mood lifter 2)Relaxation, euphorique, surélevée, somnolence, concentré, picotement, une augmentation de l’appétit, soulager la douleur Giggly, physique, esprit sédation 3)Reduzierte Angst, Ruhe, gehobener Stimmung, zerebrale Energie, Körper Sedierung 4)Calmante, relajante muscular, Relajación Mental, disminución de náuseas 5)重いフルーティーな幸せ非常に強力な頭石のバースト

Please help me with this.

like image 850
user7140275 Avatar asked Nov 24 '16 10:11

user7140275


1 Answers

You can use something like this to detect which line in your file is throwing the error:

for row in reader:
    try:
        language = detect(row[0])
    except:
        language = "error"
        print("This row throws and error:", row[0])
    row.append(language)
    all.append(row)

What you're going to see is that it probably fails at "重いフルーティーな幸せ非常に強力な頭石のバースト". My guess is that detect() isn't able to 'identify' any characters to analyze in that row, which is what the error implies.

Other things, like when the input is only a URL, also cause this error.

like image 164
Mark Cramer Avatar answered Sep 19 '22 13:09

Mark Cramer