I'm looking for a way to automatically produce an abstract, basically the first few sentances/paragraphs of a blog entry, to display in a list of articles (which are written in markdown). Currently, I'm doing something like this:
def abstract(article, paras=3):
return '\n'.join(article.split('\n')[0:paras])
to just grab the first few lines worth of text, but i'm not totally happy with the results.
What I'm really looking for is to end up with about 1/3 of a screenful of formatted text to display in the list of entries, but using the algorithm above, the amount pulled ends up with wildly varying amounts, as little as a line or two, is frequently mixed with more ideal sized abstracts.
Is there a library that's good at this kind of thing? if not, do you have any suggestions to improve the output?
The abstract should begin with a brief but precise statement of the problem or issue, followed by a description of the research method and design, the major findings, and the conclusions reached.
The abstract is always at the beginning of the article and will either be labeled "abstract" or will be set apart from the rest of the article by a different font or margins.
What is an abstract? An abstract is a concise summary of a research paper or entire thesis. It is an original work, not an excerpted passage. An abstract must be fully self-contained and make sense by itself, without further reference to outside sources or to the actual paper.
EDIT:
You can do something like this:
from textwrap import wrap
def getAbstract(text, lines=5, screenwidth=100):
width = len(' '.join([
line for block in text.splitlines()
for line in wrap(block, width=screenwidth)
][:lines]))
return text[:width] + '...'
This makes use of the textwrap algorithm to get the ideal text length. It will break the text into screen-sized lines and use them to calculate the length of the desirable number of lines.
For example applying this algorithm on the python wikipedia page entry:
print getAbstract(text, lines=7)
will give you this output:
Python is a general-purpose high-level programming language.2 Its design philosophy emphasizes code readability.[3] Python claims to "[combine] remarkable power with very clear syntax",[4] and its standard library is large and comprehensive. Its use of indentation as block delimiters is unusual among popular programming languages.
Python supports multiple programming paradigms (primarily object oriented, imperative, and functional) and features a fully dynamic type system and automatic memory management, similar to Perl, Ruby, Scheme, and Tcl. Like other dynamic languages, Python is often used as a scripting...
Without further details it's hard to help you. But if your problem was that taking the first few lines was too much for some entries you may need to have a look at textwrap
For example if you only want 100 character abstracts you can do the following:
import textwrap
abstract = textwrap.wrap(text, 100)[0]
That will also replace newlines with spaces which might be desirable depending on your requirements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With