I'm looking for an existing library to summarize or paraphrase content (I'm aiming at blog posts) - any experience with existing natural language processing libraries?
I'm open to a variety of languages, so I'm more interested in the abilities & accuracy.
In a recent survey of rewriter tools available to students and academics to reduce plagiarism, Ref-n-write was rated as the best scholarly paraphrasing tool.
QuillBot's Paraphraser helps you write better, faster, and smarter. Our rewording tool is free and easy to use—with just the click of a button, the paraphrasing tool will rephrase your sentence, paragraph, essay, or article to your liking, with many options available to customize and perfect the reworded text.
Paraphrasing a sentence means, you create a new sentence that expresses the same meaning using a different choice of words.
There was some discussion of Grok. This is now supported as OpenCCG, and will be reimplemented in OpenNLP as well.
You can find OpenCCG at http://openccg.sourceforge.net/. I would also suggest the Curran and Clark CCG parser available here: http://svn.ask.it.usyd.edu.au/trac/candc/wiki
Basically, for paraphrase, what you're going to need to do is write up something that first parses sentences of blog posts, extracts the semantic meaning of these posts, and then searches through the space of vocab words which will compositionally create the same semantic meaning, and then pick one that doesn't match the current sentence. This will take a long time and it might not make a lot of sense. Don't forget that in order to do this, you're going to need near-perfect anaphora resolution and the ability to pick up discourse-level inferences.
If you're just looking to make blog posts that don't have machine-identifiable duplicate content, you can always just use topic and focus transformations and WordNet synonyms. There have definitely been sites which have made money off of AdWords that have done this before.
I think he wants to generate blog posts by automatically paraphrasing whatever was it the blogs this system is monitoring.
This would be really interesting if you could combine 2 to 10 blog posts that are similar, but from different sources and then do a paraphrased "real" summary automatically (the size of 1 blog post).
It could also be great for Homeworks. Unfortunately it's not that easy to do.
The only way I could see is to be able to decompose every sentence into "meaning", and then randomly change the sentence structure and some words retaining the meaning.
These sentences mean the same:
It would be nontrivial to write a program to transform one of these sentences to the others, and these are simple sentences, real sentences from blogs are much more complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With