Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert HTML to a proper plain text?

is there any way I can convert HTML into proper plain text? I tried everything from raw to sanitize and even the Mail gem with it's text_part method which is supposed to do exactly that but doesn't work for me.

My best shot so far was strip_tags(strip_links(resource.body)) but <p>, <ul> etc. were not correctly converted.

This is more or less what I have in HTML:

Hello

This is some text. Blah blah blah.

Address:
John Doe
10 ABC Street
Whatever City

New Features
- Feature A
- Feature B
- Feature C
Check this out: http://www.google.com

Best,
Admin

which converts to something like

Hello
This is some text. Blah blah blah.
Address: John Doe 10 ABC Street Whatever City

New Features Feature A Feature B Feature C
Check this out: http://www.google.com

Best, Admin

Any idea?

like image 765
Cojones Avatar asked Sep 18 '13 08:09

Cojones


2 Answers

Rails 4.2.1 has #strip_tags, a built-in method especially for stripping HTML tags.

Some examples:

strip_tags("Strip <i>these</i> tags!")

=> Strip these tags!

strip_tags("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")

=> Bold no more! See more here...

strip_tags("<div id='top-bar'>Welcome to my website!</div>")

=> Welcome to my website!

Check it out in the API docs.

like image 52
klaoha06 Avatar answered Nov 13 '22 15:11

klaoha06


Found the solution here: https://github.com/alexdunae/premailer/blob/master/lib/premailer/html_to_plain_text.rb

Works like a charm!

like image 25
Cojones Avatar answered Nov 13 '22 17:11

Cojones