Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

generating Paraphrases of English text using PPDB

I need to generate paraphrase of an english sentence using the PPDB paraphrase database

I have downloaded the datasets from the website.

like image 813
Jaffer Wilson Avatar asked Oct 30 '22 05:10

Jaffer Wilson


1 Answers

I would say your first step needs to be reducing the problem into more manageable components. Second figure out whether you want to paraphrase on a one-to-one, lexical, syntactical, phrase or combination basis. To inform this decision I would take one sentence and paraphrase it myself in order to get an idea of what I am looking for. Next I would start writing a parser for the downloaded data. Then I would remove the stopwords and incorporate a part-of-speech tagger like the ones included in spaCy or nltk for your example phrase.

Since they seem to give you all the information needed to make a successive dictionary filter that is where I would start. I would write a filter which found the parts of speech for each word in my sentence in the [LHS] column of the dataset and select a source that matches the word while minimizing/maximizing the value of 1 feature (like minimizing WordLenDiff) which in the case of "businessnow" <- "business now" = -1.5. Keeping track of the target feature you will then have a basic paraphrased sentence.

using this strategy your output could turn:

"the business uses 4 gb standard."
sent_score = 0

into:

"businessnow uses 4gb standard"
sent_score = -3

After you have a basic example the you can start exploring feature selection algorithms in like those in scikit-learn, etc. and incorporate word alignment. But I would seriously cut down on the scope of the problem and increase it gradually. In the end, how you approach the problem it depends on what the designated use is and how functional it needs to be.

Hope this helps.

like image 196
Dan Temkin Avatar answered Nov 15 '22 07:11

Dan Temkin