what is distant supervision?

Tags:

According to my understanding, Distant Supervision is the process of specifying the concept which the individual words of a passage, usually a sentence, are trying to convey.

For example, a database maintains the structured relationship concerns( NLP, this sentence).

Our distant supervision system would take as input the sentence: "This is a sentence about NLP."

Based on this sentence it would recognize the entities, since as a pre-processing step the sentence would have been passed through a named-entity recognizer, NLP & this sentence.

Since our database has it that NLP and this sentence are related by the bond of concern(s) it would identify the input sentence as expressing the relationship Concerns(NLP, this sentence).

My questions is two fold:

1) What is the use of that? Is it that later our system might see a sentence in "the wild" such as That sentence is about OPP and realize that it's seen something similar to that before and thereby realize the novel relationship such that concerns(OPP, that sentence)., based only on the words/ individual tokens?

2) Does it take into account the actual words of the sentence? The verb 'is' and the adverb 'about' for instance, realizing (through WordNet or some other hyponymy system) that this is somehow similar to the higher-order concept "concerns"?

Does anyone have some code used to generate a distant supervision system that I could look at, i.e. a system that cross references a KB, such as Freebase, and a corpus, such as the NYTimes, and produces a distant supervision database? I think that would go a long way in clarifying my conception of distant supervision.

295

asked Apr 11 '15 08:04

smatthewenglish

2 Answers

RE 1) Yes, this is exactly right. In the end, what we want is a classifier that takes as input text, and a pair of entity mentions in the text, and tells us what relation holds between those entities in that sentence. Distant supervision is a way of mocking this training data, using "distant supervision" from a known knowledge base. But, the end goal is the same as most machine learning tasks: generalize to new sentences.

RE 2) Certainly! Distant supervision only applies to how the training data is generated [1]. Once you've assumed distant supervision, what you're left with is a corpus of (sentence, relation_for_sentence) pairs, and then you extract all of the usual NLP features on the sentence.

[1] To a first approximation -- there are "distantly supervised" models (like MultiR and MIML-RE) which don't directly generate fake training data, but incorporate the supervision indirectly into the training procedure itself. But, even in these, there is a factor in the latent-variable model that amounts to a per-sentence classification, and it's just that the output variable is latent rather than naively "observed" as in vanilla distant supervision.

172

answered Sep 25 '22 18:09

Gabor Angeli

according to my understanding now- the real value of distant supervision is that we can use it to annotate a big corpus without having to manually consider each sentence- since this is very expensive in terms of person hours- so in the end some of the recognized relationships in sentences will be false- but it will be- hopefully "pretty good"... which is useful- in some applications such as... academics competing with eachother to try to get marginally better scores on this silly task and... other things such as... (examples are welcome)

answered Sep 22 '22 18:09

smatthewenglish

Related questions
                            
                                Spacy, Strange similarity between two sentences
                            
                                FreqDist in NLTK not sorting output
                            
                                Running .exe on Azure
                            
                                List of Natural Language Processing Tools in Regards to Sentiment Analysis - Which one do you recommend [closed]
                            
                                A Viable Solution for Word Splitting Khmer?
                            
                                How to use vector representation of words (as obtained from Word2Vec,etc) as features for a classifier?
                            
                                Computational Complexity of Self-Attention in the Transformer Model
                            
                                How does gensim calculate doc2vec paragraph vectors
                            
                                PyTorch: RuntimeError: Input, output and indices must be on the current device
                            
                                Open-source OCR library for Arabic [closed]
                            
                                Get weight matrices from gensim word2Vec
                            
                                How to preprocess text for embedding?
                            
                                Detect English verb tenses using NLTK
                            
                                Python parse text from multiple txt file
                            
                                SQL: Most Overdue pair of numbers?
                            
                                How to train the Stanford NLP Sentiment Analysis tool
                            
                                Can stop-words be found automatically?
                            
                                Abbreviation detection
                            
                                tag generation from a small text content (such as tweets)
                            
                                how do I create my own training corpus for stanford tagger?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what is distant supervision?

Tags:

nlp

supervised-learning

unsupervised-learning

stanford-nlp

smatthewenglish

People also ask

2 Answers

Gabor Angeli

smatthewenglish

Recent Activity

Donate For Us