Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split HTML after N words in python

Is there any way to split a long string of HTML after N words? Obviously I could use:

' '.join(foo.split(' ')[:n])

to get the first n words of a plain text string, but that might split in the middle of an html tag, and won't produce valid html because it won't close the tags that have been opened.

I need to do this in a zope / plone site - if there is something as standard in those products that can do it, that would be ideal.

For example, say I have the text:

<p>This is some text with a 
  <a href="http://www.example.com/" title="Example link">
     bit of linked text in it
  </a>.
</p>

And I ask it to split after 5 words, it should return:

<p>This is some text with</p>

7 words:

<p>This is some text with a 
  <a href="http://www.example.com/" title="Example link">
     bit
  </a>
</p>
like image 381
rjmunro Avatar asked Dec 11 '08 16:12

rjmunro


2 Answers

Take a look at the truncate_html_words function in django.utils.text. Even if you aren't using Django, the code there does exactly what you want.

like image 197
Carl Meyer Avatar answered Sep 24 '22 03:09

Carl Meyer


I've heard that Beautiful Soup is very good at parsing html. It will probably be able to help you get correct html out.

like image 31
recursive Avatar answered Sep 25 '22 03:09

recursive