Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a good natural language library to use for paraphrasing? [closed]

I'm looking for an existing library to summarize or paraphrase content (I'm aiming at blog posts) - any experience with existing natural language processing libraries?

I'm open to a variety of languages, so I'm more interested in the abilities & accuracy.

like image 482
jeffreypriebe Avatar asked Aug 24 '08 20:08

jeffreypriebe


People also ask

Which paraphrasing tool is best?

In a recent survey of rewriter tools available to students and academics to reduce plagiarism, Ref-n-write was rated as the best scholarly paraphrasing tool.

Is there a site to paraphrase rewrite sentences?

QuillBot's Paraphraser helps you write better, faster, and smarter. Our rewording tool is free and easy to use—with just the click of a button, the paraphrasing tool will rephrase your sentence, paragraph, essay, or article to your liking, with many options available to customize and perfect the reworded text.

What is paraphrasing in NLP?

Paraphrasing a sentence means, you create a new sentence that expresses the same meaning using a different choice of words.


2 Answers

There was some discussion of Grok. This is now supported as OpenCCG, and will be reimplemented in OpenNLP as well.

You can find OpenCCG at http://openccg.sourceforge.net/. I would also suggest the Curran and Clark CCG parser available here: http://svn.ask.it.usyd.edu.au/trac/candc/wiki

Basically, for paraphrase, what you're going to need to do is write up something that first parses sentences of blog posts, extracts the semantic meaning of these posts, and then searches through the space of vocab words which will compositionally create the same semantic meaning, and then pick one that doesn't match the current sentence. This will take a long time and it might not make a lot of sense. Don't forget that in order to do this, you're going to need near-perfect anaphora resolution and the ability to pick up discourse-level inferences.

If you're just looking to make blog posts that don't have machine-identifiable duplicate content, you can always just use topic and focus transformations and WordNet synonyms. There have definitely been sites which have made money off of AdWords that have done this before.

like image 182
Robert Elwell Avatar answered Oct 26 '22 06:10

Robert Elwell


I think he wants to generate blog posts by automatically paraphrasing whatever was it the blogs this system is monitoring.

This would be really interesting if you could combine 2 to 10 blog posts that are similar, but from different sources and then do a paraphrased "real" summary automatically (the size of 1 blog post).

It could also be great for Homeworks. Unfortunately it's not that easy to do.

The only way I could see is to be able to decompose every sentence into "meaning", and then randomly change the sentence structure and some words retaining the meaning.

These sentences mean the same:

  • I hate this guy, he is so dumb.
  • This guy is stupid, I hate him.
  • I despise this dumb guy.
  • He is dumb, I hate him.

It would be nontrivial to write a program to transform one of these sentences to the others, and these are simple sentences, real sentences from blogs are much more complicated.

like image 25
Osama Al-Maadeed Avatar answered Oct 26 '22 06:10

Osama Al-Maadeed