How would you strip HTML tags in PostgreSQL such that the data inside the tags is preserved?
I found some solutions by googling it but they were striping the text between the tags too!
How do you remove your HTML Code from a given HTML URL? Users can copy and paste HTML code using the view source of the URL, or click on the URL button and enter the URL and click on Strip HTML Button.
To strip out all the HTML tags from a string there are lots of procedures in JavaScript. In order to strip out tags we can use replace() function and can also use . textContent property, . innerText property from HTML DOM.
stripHtml( html ) Changes the provided HTML string into a plain text string by converting <br> , <p> , and <div> to line breaks, stripping all other tags, and converting escaped characters into their display values.
select regexp_replace(content, E'<[^>]+>', '', 'gi') from message;
Feed your database with XML datatype, not with "second class" TEXT, because is very simple to convert HTML into XHTML (see HTML-Tidy or standard DOM's loadHTML()
and saveXML()
methods).
! IT IS FAST AND IS VERY SAFE !
The commom information retrieval need, is not a full content, but something into the XHTML, so the power of xpath
is wellcome.
Example: retrive all paragraphs with class="fn"
:
WITH needinfo AS (
SELECT *, xpath('//p[@class="fn"]//text()', xhtml)::text[] as frags
FROM t
) SELECT array_to_string(frags,' ') AS my_p_fn2txt
FROM needinfo
WHERE array_length(frags , 1)>0
-- for full content use xpath('//text()',xhtml)
I not recomend because is not an "information retrieval" solution... and, as @James and others commented here, the regex solution is not so safe.
I like "pure SQL", for me is better than use Perl (se @Daniel's solution) or another.
CREATE OR REPLACE FUNCTION strip_tags(TEXT) RETURNS TEXT AS $$
SELECT regexp_replace(
regexp_replace($1, E'(?x)<[^>]*?(\s alt \s* = \s* ([\'"]) ([^>]*?) \2) [^>]*? >', E'\3'),
E'(?x)(< [^>]*? >)', '', 'g')
$$ LANGUAGE SQL;
See this and many other variations at siafoo.net, eskpee.wordpress, ... and here at Stackoverflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With