Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rails: get a teaser/excerpt for an article

I have a page that will list news articles. To cut down on the page's length, I only want to display a teaser (the first 200 words / 600 letters of the article) and then display a "more..." link, that, when clicked, will expand the rest of the article in a jQuery/Javascript way. Now, I've all that figured out and even found the following helper method on some paste page, which will make sure, that the news article (string) is not chopped up right in the middle of a word:

 def shorten (string, count = 30)
    if string.length >= count
      shortened = string[0, count]
      splitted = shortened.split(/\s/)
      words = splitted.length
      splitted[0, words-1].join(" ") + ' ...'
    else
      string
    end
  end

The problem that I have is that the news article bodies that I get from the DB are formatted HTML. So if I'm unlucky, the above helper will chop up my article string right in the middle of an html tag and insert the "more..." string there (e.g. between ""), which will corrupt my html on the page.

Is there any way around this or is there a plugin out there that I can use to generate excerpts/teasers from an HTML string?

like image 621
Sebastian Avatar asked Feb 11 '09 12:02

Sebastian


3 Answers

You can use a combination of Sanitize and Truncate.

truncate("And they found that many people were sleeping better.", 
  :omission => "... (continued)", :length => 15)
# => And they found... (continued)

I'm doing a similar task where I have blog posts and I just want to show a quick excerpt. So in my view I simply do:

sanitize(truncate(blog_post.body, length: 150))

That strips out the HTML tags, gives me the first 150 characters and is handled in the view so it's MVC friendly.

Good luck!

like image 165
mwilliams Avatar answered Oct 20 '22 05:10

mwilliams


My answer here should do work. The original question (err, asked by me) was about truncating markdown, but I ended up converting the markdown to HTML then truncating that, so it should work.

Of course if your site gets much traffic, you should cache the excerpt (perhaps when the post is created/updated, you could store the excerpt in the database?), this would also mean you could allow the user to modify or enter their own excerpt

Usage:

>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")
=> <p><b><a href="hi">Someth...</a></b></p>

..and the code (copied from the other answer):

require 'rexml/parsers/pullparser'

class String
  def truncate_html(len = 30, at_end = nil)
    p = REXML::Parsers::PullParser.new(self)
    tags = []
    new_len = len
    results = ''
    while p.has_next? && new_len > 0
      p_e = p.pull
      case p_e.event_type
      when :start_element
        tags.push p_e[0]
        results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
      when :end_element
        results << "</#{tags.pop}>"
      when :text
        results << p_e[0][0..new_len]
        new_len -= p_e[0].length
      else
        results << "<!-- #{p_e.inspect} -->"
      end
    end
    if at_end
      results << "..."
    end
    tags.reverse.each do |tag|
      results << "</#{tag}>"
    end
    results
  end

  private

  def attrs_to_s(attrs)
    if attrs.empty?
      ''
    else
      ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
    end
  end
end
like image 44
dbr Avatar answered Oct 20 '22 05:10

dbr


Thanks a lot for your answers! However, in the meantime I stumbled upon the jQuery HTML Truncator plugin, which perfectly fits my purposes and shifts the truncation to the client-side. It doesn't get any easier :-)

like image 2
Sebastian Avatar answered Oct 20 '22 04:10

Sebastian