How do I get started with a project on Text Summarization using NLP?

Tags:

My final year engineering project requires me to build an application using Java or Python which summarizes a text document using Natural Language Processing. How do I even begin with the programming of such an application?

Based on some research, I've just noted down that extraction-based summarization will be the best bet for me since it isn't so complex as abstraction based algorithms. Even then, it'd be really helpful if someone would guide me in the right direction to go about this.

481

asked Jun 21 '16 08:06

Hamza Moiyadi

2 Answers

Text summarization is still an open problem in NLP.

I guess that you might start by asking yourself what is the purpose of the summary:

A summary that discriminates a document from other documents
A summary that mines only the frequent patterns
A summary that covers all the topics in the document
etc

Because this will influence the way you generate the summary.

But as a start you could use in python the NLTK framework to extract basic elements from a text. For example you can extract the most frequent words, or the most frequent N-grams( N adjacent words) from the text.

Also a simple way to extract the most relevant sentences is using TF-IDF that stands for term frequency, Inverse document frequency. Basically this function gives higher scores to sentences that tend to appear frequently in one document compared to other document.

Some python libraries that you can use :

sickitlearn that has more advanced features.
Also gensim library has a text summarization tutorial (also in python)
You can also use Dato that has as well a text analysis module.

Some helpful resources:

This book: Foundations of Statistical Natural Language Processing
There is also a coursera course that you can enroll in, in order to understand the basics in text mining: https://www.coursera.org/learn/text-mining
Also this coursera course from stanford university (TF-IDF is explained in one of the videos) https://class.coursera.org/nlp/lecture/preview

Hope this helps.

107

answered Nov 10 '22 20:11

sel

These days, using Neural Net to summarize the corpus is considered state of the art.

Here is an article worth reading for you: A Neural Attention Model for Sentence Summarization http://www.aclweb.org/anthology/D15-1044

answered Nov 10 '22 19:11

aerin

Related questions
                            
                                What's the difference between Stanford Tagger, Parser and CoreNLP?
                            
                                how to create word vector
                            
                                Python stemming (with pandas dataframe)
                            
                                OpenNLP lemmatization example
                            
                                Spacy Pipeline?
                            
                                Binary numbers instead of one hot vectors
                            
                                Best Algorithm to make correction typos in text
                            
                                Python (NLTK) - more efficient way to extract noun phrases?
                            
                                tokenizer.texts_to_sequences Keras Tokenizer gives almost all zeros
                            
                                Stanford typed dependencies using coreNLP in python
                            
                                NLP Transformers: Best way to get a fixed sentence embedding-vector shape?
                            
                                Applying SVD throws a Memory Error instantaneously?
                            
                                What is the default nltk part of speech tagset?
                            
                                How is stemming useful?
                            
                                When to use GlobalAveragePooling1D and when to use GlobalMaxPooling1D while using Keras for an LSTM model?
                            
                                In language modeling, why do I have to init_hidden weights before every new epoch of training? (pytorch)
                            
                                What is the difference between token and span (a slice from a doc) in spaCy?
                            
                                Transformer: Error importing packages. "ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'"
                            
                                How can I create relative/approximate dates in Perl?
                            
                                using python nltk to find similarity between two web pages?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get started with a project on Text Summarization using NLP?

Tags:

nlp

stanford-nlp

Hamza Moiyadi

People also ask

2 Answers

sel

aerin

Recent Activity

Donate For Us