I have a use case where I want to extract main meaningful part of the sentence using spacy or nltk or any NLP libraries.
Example sentence1: "How Can I raise my voice against harassment" Intent would be: "raise voice against harassment"
Example sentence2: "Donald Duck is created by which cartoonist/which man/whom ?" Intent would be: "Donald duck is created by"
Example sentence3: "How to retrieve the main intent of a sentence using spacy or nltk" ? Intent: "retrieve main intent of sentence using spacy nltk"
I am new to dependency parsing and don't exactly know how to do this. Please help me.
You have to define the ultimate task you want to perform and define what exactly is "intent" / "main information" or "meaning of text".
From first look, it seems like you're asking to solve a natural language problem magically. But lets look at the question and what you're really asking, lets avoid all the notion of intent/labels or language (for a while) and just look at what's the in-/output:
[in]: "How Can I raise my voice against harassment"
[out]: "raise voice against harassment"
[in]: "Donald Duck is created by which cartoonist/which man/whom ?"
[out]: "Donald duck is created by"
[in]: "How to retrieve the main intent of a sentence using spacy or nltk ?"
[out]: "retrieve main intent of sentence using spacy nltk"
It seems like all your output tokens/words are just a quote from your input, in that case what if you simply treat your problem as a "span/sequence annotation" task, i.e.
[in]: "How Can I raise my voice against harassment"
[out]: [0, 0, 0, 1, 0, 1, 1, 1]
[in]: "Donald Duck is created by which cartoonist/which man/whom ?"
[out]: [1, 1, 1, 1, 0, 0, 0]
[in]: "How to retrieve the main intent of a sentence using spacy or nltk ?"
[out]: [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Assuming that each word is a binary label, the output should label 1
for the words that you want to extract from the input and 0
for the ones you don't.
Now given it's a simple binary sequence labeling task, one could simply do:
But step back a little,
Okay, even if we don't talk about "intent" and just want to extract the main meaning,
What is dependency parsing?
In short, it provides a structured representation of text. But non of the structure in traditional dependency formalism has notion of "intent".
Proof: CTR + F on https://web.stanford.edu/~jurafsky/slp3/15.pdf
So I don't think just simply parsing the text with dependency trees would help unless the notion of "intent" is better defined in your scenario.
From https://github.com/explosion/spaCy/blob/master/examples/training/train_intent_parser.py
Yes, that's an example of using a combination parsing labels and sequence labeling and defining that as "intent", more specifically, we see examples from https://github.com/explosion/spaCy/blob/master/examples/training/train_intent_parser.py#L31
TRAIN_DATA = [
(
"find a cafe with great wifi",
{
"heads": [0, 2, 0, 5, 5, 2], # index of token head
"deps": ["ROOT", "-", "PLACE", "-", "QUALITY", "ATTRIBUTE"],
},
),
(
"find a hotel near the beach",
{
"heads": [0, 2, 0, 5, 5, 2],
"deps": ["ROOT", "-", "PLACE", "QUALITY", "-", "ATTRIBUTE"],
},
),
Each training data is made up of
And an example in/outputs from https://github.com/explosion/spaCy/blob/master/examples/training/train_intent_parser.py#L173
[in]: find a hotel with good wifi
[out]:
[
('find', 'ROOT', 'find'),
('hotel', 'PLACE', 'find'),
('good', 'QUALITY', 'wifi'),
('wifi', 'ATTRIBUTE', 'hotel')
]
The example above shows that the whole list of triplets are defined as an intent, rather than just the raw strings. The triplets refers to the (dependent, relation, head)
, e.g. the hotel
is the PLACE
to find
from the triplets ('hotel', 'PLACE', 'find')
.
Note: This is solely SpaCy notion of "semantics" or "intent" which is not wrong but well-defined and hence a model to perform this task is trainable in a supervised machine learning paradigm. Details, see https://spacy.io/usage/examples
Depending on how and what you define as intent/semantics, the in/outputs will change and the model to train may be different.
Because what does "main meaning" or "intent" mean if it's just a string?
We go back to the lack of definition that makes the task a magical one rather than one that computers can perform.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With