How to get domain of words using WordNet in Python?

Tags:

How can I find domain of words using nltk Python module and WordNet?

Suppose I have words like (transaction, Demand Draft, cheque, passbook) and the domain for all these words is "BANK". How can we get this using nltk and WordNet in Python?

I am trying through hypernym and hyponym relationship:

For example:

from nltk.corpus import wordnet as wn
sports = wn.synset('sport.n.01')
sports.hyponyms()
[Synset('judo.n.01'), Synset('athletic_game.n.01'), Synset('spectator_sport.n.01'),    Synset('contact_sport.n.01'), Synset('cycling.n.01'), Synset('funambulism.n.01'), Synset('water_sport.n.01'), Synset('riding.n.01'), Synset('gymnastics.n.01'), Synset('sledding.n.01'), Synset('skating.n.01'), Synset('skiing.n.01'), Synset('outdoor_sport.n.01'), Synset('rowing.n.01'), Synset('track_and_field.n.01'), Synset('archery.n.01'), Synset('team_sport.n.01'), Synset('rock_climbing.n.01'), Synset('racing.n.01'), Synset('blood_sport.n.01')]

and

bark = wn.synset('bark.n.02')
bark.hypernyms()
[Synset('noise.n.01')]

728

asked Feb 20 '14 08:02

Madhusudan

2 Answers

There is no explicit domain information in the Princeton WordNet nor the NLTK's WN API.

I would recommend you get a copy of the WordNet Domain resource and then link your synsets using the domains, see http://wndomains.fbk.eu/

After you've registered and completed the download you will see a wn-domains-3.2-20070223 textfile, which is a tab-delimited file with first column the offset-PartofSpeech identifier and the 2nd column contains the domain tags separated by spaces, e.g.

00584282-v  military pedagogy
00584395-v  military school university
00584526-v  animals pedagogy
00584634-v  pedagogy
00584743-v  school university
00585097-v  school university
00585271-v  pedagogy
00585495-v  pedagogy
00585683-v  psychological_features

Then you use the following script to access synsets' domain(s):

from collections import defaultdict
from nltk.corpus import wordnet as wn

# Loading the Wordnet domains.
domain2synsets = defaultdict(list)
synset2domains = defaultdict(list)
for i in open('wn-domains-3.2-20070223', 'r'):
    ssid, doms = i.strip().split('\t')
    doms = doms.split()
    synset2domains[ssid] = doms
    for d in doms:
        domain2synsets[d].append(ssid)

# Gets domains given synset.
for ss in wn.all_synsets():
    ssid = str(ss.offset).zfill(8) + "-" + ss.pos()
    if synset2domains[ssid]: # not all synsets are in WordNet Domain.
        print ss, ssid, synset2domains[ssid]

# Gets synsets given domain.        
for dom in sorted(domain2synsets):
    print dom, domain2synsets[dom][:3]

Also look for the wn-affect that is very useful to disambiguate words for sentiment within the WordNet Domain resource.

With updated NLTK v3.0, it comes with the Open Multilingual WordNet (http://compling.hss.ntu.edu.sg/omw/), and since the French synsets share the same offset IDs, you can simply use the WND as a crosslingual resource. The french lemma names can be accessed as such:

# Gets domains given synset.
for ss in wn.all_synsets():
    ssid = str(ss.offset()).zfill(8) + "-" + ss.pos()
    if synset2domains[ssid]: # not all synsets are in WordNet Domain.
        print ss, ss.lemma_names('fre'), ssid, synset2domains[ssid]

Note that the most recent version of NLTK changes synset properties to "get" functions: Synset.offset -> Synset.offset()

answered Oct 30 '22 16:10

alvas

As @alvas suggests, you can use WordNetDomains. You have to download both WordNet2.0 (in its current status WordNetDomains does not support the sense inventory of WordNet3.0, which is the default version of WordNet used by NLTK) and WordNetDomains.

WordNet2.0 can be downloaded from here
WordNetDomains can be downloaded from here (after having being granted permission).

I have created a very simple Python API that loads both resources in Python3.x and provides some common routines you might need (such as getting a set of domains linked to a given term, or to a given synset, etc.). The data load of WordNetDomains is from @alvas.

This is how it looks like (with most comments omitted):

from collections import defaultdict
from nltk.corpus import WordNetCorpusReader
from os.path import exists


class WordNetDomains:
    def __init__(self, wordnet_home):
        #This class assumes you have downloaded WordNet2.0 and WordNetDomains and that they are on the same data home.
        assert exists(f'{wordnet_home}/WordNet-2.0'), f'error: missing WordNet-2.0 in {wordnet_home}'
        assert exists(f'{wordnet_home}/wn-domains-3.2'), f'error: missing WordNetDomains in {wordnet_home}'

        # load WordNet2.0
        self.wn = WordNetCorpusReader(f'{wordnet_home}/WordNet-2.0/dict', 'WordNet-2.0/dict')

        # load WordNetDomains (based on https://stackoverflow.com/a/21904027/8759307)
        self.domain2synsets = defaultdict(list)
        self.synset2domains = defaultdict(list)
        for i in open(f'{wordnet_home}/wn-domains-3.2/wn-domains-3.2-20070223', 'r'):
            ssid, doms = i.strip().split('\t')
            doms = doms.split()
            self.synset2domains[ssid] = doms
            for d in doms:
                self.domain2synsets[d].append(ssid)

    def get_domains(self, word, pos=None):
        word_synsets = self.wn.synsets(word, pos=pos)
        domains = []
        for synset in word_synsets:
            domains.extend(self.get_domains_from_synset(synset))
        return set(domains)

    def get_domains_from_synset(self, synset):
        return self.synset2domains.get(self._askey_from_synset(synset), set())

    def get_synsets(self, domain):
        return [self._synset_from_key(key) for key in self.domain2synsets.get(domain, [])]

    def get_all_domains(self):
        return set(self.domain2synsets.keys())

    def _synset_from_key(self, key):
        offset, pos = key.split('-')
        return self.wn.synset_from_pos_and_offset(pos, int(offset))

    def _askey_from_synset(self, synset):
        return self._askey_from_offset_pos(synset.offset(), synset.pos())

    def _askey_from_offset_pos(self, offset, pos):
        return str(offset).zfill(8) + "-" + pos

answered Oct 30 '22 17:10

Alex Moreo

Related questions
                            
                                Creating new object instance still has old data in it [duplicate]
                            
                                How do I handle exceptions on Python Social Auth [closed]
                            
                                How to get error location from json.loads in Python
                            
                                Open images from a folder one by one using python?
                            
                                Serial import python
                            
                                Why isn't setup.py dependency_links doing anything?
                            
                                Add rate of change column to Pandas DataFrame
                            
                                difference of two sets of intervals
                            
                                Calculate weighted pairwise distance matrix in Python
                            
                                Python string encoding for a variable
                            
                                Saving a file in Mongodb's GridFS with pymongo results in a truncated file - python 2.7 on Windows 7
                            
                                TimeSeries with a groupby in Pandas
                            
                                Find equidistant points between two coordinates
                            
                                NameError: name 'self' is not defined, even though it is?
                            
                                strange numpy fft performance
                            
                                "TypeError: 'unicode' object does not support item assignment" in dictionaries
                            
                                Is there a way to remember the position in a python iterator?
                            
                                Flask static files getting 404
                            
                                Using Python and lxml to strip only the tags that have certain attributes/values
                            
                                Matplotlib plt.show() isn't showing graph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get domain of words using WordNet in Python?

Tags:

python

nlp

nltk

wordnet

Madhusudan

People also ask

2 Answers

alvas

Alex Moreo

Recent Activity

Donate For Us