Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect passive or active sentence from text

Tags:

python

nlp

spacy

Using the Python package spaCy, how can one detect whether a sentence uses a passive or active voice? For example, the following sentences should be detected as using a passive and active voice respectively:

passive_sentence = "John was accused of committing crimes by David"
# passive voice "John was accused"

active_sentence = "David accused John of committing crimes"
# active voice "David accused John"
like image 336
sruthi Avatar asked Oct 17 '25 05:10

sruthi


2 Answers

The following solution employs spaCy's rule-based matching engine to detect and display the parts of a sentence that use the active or passive voice. No method is going to correctly identify 100% of sentences, especially those that are more complex, however, the solution below handles the vast majority of cases and can likely be improved to handle more edge cases.

Overview of Rule/Pattern Matching

The key components are the rules you provide to the matcher. I'll explain one of the passive voice rules below---if you understand one, you should be able to understand all the other rules and begin to construct your own rules to match particular patterns using the spaCy token-based matching documentation. Consider the following passive voice rule:

[{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBN'}]

This rule/pattern is used by the matcher to find a sequential combination of tokens. Specifically, the matcher will:

  1. Find a token whose dependency label (DEP) is passive nominal subject (nsubjpass).
  2. Find a token whose DEP is passive auxiliary (auxpass), preceded by zero or more tokens whose DEP is auxiliary (aux). Note that the key "OP" stands for "operator", which defines how often a token pattern should be matched. See the operators and quantifiers subsection of the spaCy documentation for more information.
  3. Find a final token whose part of speech is tagged (TAG) as verb past participle (VBN).

If you are unfamiliar with Part of Speech (PoS) tags, please see this tutorial. Additionally, in-depth explanations of the dependency labels and what they mean are provided on the Universal Dependencies (UD) dependency documentation page.

Solution

import spacy
from spacy.matcher import Matcher

passive_sentences = [
    "John was accused of committing crimes by David.",
    "She was sent a cheque for a thousand euros.",
    "He was given a book for his birthday.",
    "He will be sent away to school.",
    "The meeting was called off.",
    "He was looked after by his grandmother.",
]
active_sentences = [
    "David accused John of committing crimes.",
    "Someone sent her a cheque for a thousand euros.",
    "I gave him a book for his birthday.",
    "They will send him away to school.",
    "They called off the meeting.",
    "His grandmother looked after him."
]
composite_sentences = [
    "Three men seized me, and I was carried to the car."
]

# Load spaCy pipeline (model)
nlp = spacy.load('en_core_web_trf')
# Create pattern to match passive voice use
passive_rules = [
    [{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBN'}],
    [{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBZ'}],
    [{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'RB'}, {'TAG': 'VBN'}],
]
# Create pattern to match active voice use
active_rules = [
    [{'DEP': 'nsubj'}, {'TAG': 'VBD', 'DEP': 'ROOT'}],
    [{'DEP': 'nsubj'}, {'TAG': 'VBP'}, {'TAG': 'VBG', 'OP': '!'}],
    [{'DEP': 'nsubj'}, {'DEP': 'aux', 'OP': '*'}, {'TAG': 'VB'}],
    [{'DEP': 'nsubj'}, {'DEP': 'aux', 'OP': '*'}, {'TAG': 'VBG'}],
    [{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '*'}, {'TAG': 'VBG'}],
    [{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '*'}, {'TAG': 'VBZ'}],
    [{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '+'}, {'TAG': 'VBD'}],
]

matcher = Matcher(nlp.vocab)  # Init. the matcher with a vocab (note matcher vocab must share same vocab with docs)
matcher.add('Passive',  passive_rules)  # Add passive rules to matcher
matcher.add('Active', active_rules)  # Add active rules to matcher
text = passive_sentences + active_sentences + composite_sentences  # Combine various passive/active sentences

for sentence in text:
    doc = nlp(sentence)  # Process text with spaCy model
    matches = matcher(doc)  # Get matches
    print("-"*40 + "\n" + sentence)
    if len(matches) > 0:
        for match_id, start, end in matches:
            string_id = nlp.vocab.strings[match_id]
            span = doc[start:end]  # the matched span
            print("\t{}: {}".format(string_id, span.text))
    else:
        print("\tNo active or passive voice detected.")

Output

----------------------------------------
John was accused of committing crimes by David.
    Passive: John was accused
----------------------------------------
She was sent a cheque for a thousand euros.
    Passive: She was sent
----------------------------------------
He was given a book for his birthday.
    Passive: He was given
----------------------------------------
He will be sent away to school.
    Passive: He will be sent
----------------------------------------
The meeting was called off.
    Passive: meeting was called
----------------------------------------
He was looked after by his grandmother
    Passive: He was looked
----------------------------------------
David accused John of committing crimes.
    Active: David accused
----------------------------------------
Someone sent her a cheque for a thousand euros.
    Active: Someone sent
----------------------------------------
I gave him a book for his birthday.
    Active: I gave
----------------------------------------
They will send him away to school.
    Active: They will send
----------------------------------------
They called off the meeting.
    Active: They called
----------------------------------------
His grandmother looked after him..
    Active: grandmother looked
----------------------------------------
Three men seized me, and I was carried to the car.
    Active: men seized
    Passive: I was carried
like image 162
Kyle F. Hartzenberg Avatar answered Oct 18 '25 17:10

Kyle F. Hartzenberg


There is no easy solution for this. If you're looking for something simple, accuracy might take a hit. There is a wealth of info about NLP detecting passive and active voice in a text, proprietary algorithms being the most accurate, but they come at a cost.

What you're looking for, if it's for a custom hobby project, could have a quick solution trying out something like this, but if you follow the comments, you'll notice even here the accuracy is not in the double or even single 9 percentage rates.

You'll have to go more complex for higher accuracy, but don't expect double 9.

like image 27
Remzinho Avatar answered Oct 18 '25 18:10

Remzinho



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!