Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to automatically excerpt user generated content?

I run a website that allows users to write blog-post, I would really like to summarize the written content and use it to fill the <meta name="description".../>-tag for example.

What methods can I employ to automatically summarize/describe the contents of user generated content?
Are there any (preferably free) methods out there that have solved this problem?

(I've seen other websites just copy the first 100 or so words but this strikes me as a sub-optimal solution.)

like image 474
Jacco Avatar asked Dec 10 '22 19:12

Jacco


1 Answers

Think of the task of summarization as a challenge to 'select the most important sentences' from the document.

The method described in The Automatic Creation of Literature Abstracts by H.P. Luhn (1958) describes a naive method that actually performs quite well. Try giving it a shot.

If your website is in Python coding this algorithm using the NLTK (Natural Language Toolkit) is a fun task.

like image 125
theycallmemorty Avatar answered Jan 03 '23 14:01

theycallmemorty