Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to pass multiple dictionary in enchant?

Is there any way I can use multiple dictionary in enchant. This is what I do,

import enchant
d = enchant.Dict("en_US")
d.check("materialise")
>> False

But if I use enchant.Dict("en_UK"), I will get True. What is the best way to combine multiple dictionaries, so that it will return True no matter materialise or materialize as the input argument?

like image 300
Mass17 Avatar asked Oct 24 '19 11:10

Mass17


3 Answers

@Mass17 that is actually not correct. The expression "en_US" and "en_UK" is a logical AND operation on 2 strings of which the result is "en_UK". Here's how the AND operator works in the above expression: (1) first, any non-empty string is considered True, (2) if the left string is True then the right string is checked and returned. Read about Python's short-circuit evaluation for some insight about why it works this way.

So:

>>> "en_US" and "en_UK"
'en_UK'

And note, if you switch the order of the strings:

>>> "en_UK" and "en_US"
'en_US'

The words "materialise" and "materialize" BOTH appear in your "en_UK" dictionary, hence the results you got. You haven't actually "combined" the 2 dictionaries yet.

like image 110
SethCamd Avatar answered Oct 12 '22 23:10

SethCamd


I may be late here, but this question intrigued me too.

So, the solution for using multiple dialects of the English language in Python's enchant is as below:

    import enchant
    '''
    Use "en" simply to cover all available dialects and word usages of the English language
    '''
    d = enchant.Dict("en")
    d.check("materialise")  # UK (en_GB)
    >>> True
    
    d.check("materialize")  # USA (en_US)
    >>> True

Hope this helps for our future readers here :)

like image 35
Huzy Avatar answered Oct 13 '22 00:10

Huzy


For Hunspell dictionaries there's a workaround if both dictionaries share the same .aff file and I suppose en_US and en_GB pass that condition.

The author is Sergey Kurakin and the Bash script is (dic_combine.sh) as follows:

#!/bin/bash

# Combines two or more hunspell dictionaries.
# (C) 2010 Sergey Kurakin <kurakin_at_altlinux_dot_org>

# Attention! All source dictionaries MUST share the same affix file.

# Usage: dic_combine source1.dic source2.dic [source3.dic...] > combined.dic

TEMPFILE=`mktemp`

cat $@ | sort --unique | sed -r 's|^[0123456789]*$||;/^$/d' > $TEMPFILE

cat $TEMPFILE | wc -l
cat $TEMPFILE
rm -f $TEMPFILE
rm -f $TEMPFILE 

So, you have to put those dictionary files in a directory and run:

$ dic_combine en_US.dic en_GB.dic > en.dic
like image 20
ipaleka Avatar answered Oct 12 '22 23:10

ipaleka