I'm looking for syntatic examples or common techniques for doing regular expression style transformations on words instead of characters, given a procedural language.
For example, to trace copying, one would want to create a document with similar meaning but with different word choices.
I'd like to be able to concisely define these possible transformations that I can apply to a text stream.
Eg. "fast noun" to "rapid noun", but "go fast." wouldn't get transformed (no noun afterwards.
Or: "Alice will sing song" to "song will be sung by Alice"
I'd expect this to be done in grammatical checkers, such as detecting passive voice.
A C# implementation for this sort of language-processing would be really neat, but I think the bulk of any effort is coming up with the right rules - Keeping the rules clear and understandable seems like a place to begin.
You could try Jason Rennie > WordNet-QueryData-1.47 > WordNet::QueryData
One good place to start researching would be "Word Net" - it's a dictionary of semantics, grouping words together by similar meaning, and also recording the relationships between words in useful ways.
There are a bunch of software projects leveraging the Word Net corpus, one of them may be what you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With