We are storing blocks of HTML content in MySQL (it is formatted text created in CKeditor. Ckeidor adds inline CSS styles to format the text). We are storing the database in a column called "page_content"
We want a search feature that allows users to search the text (only). However we don't want the search to return HTML characters which is what is happening right now. For example, if we search "font" we dont want the search results to return pages with the HTML styles <font>
Is there some way to ignore HTML/CSS characters when searching HTML stored content from MySQL?
Have you considered setting up a separate table for these searches? MySQL's full text search only works with MyISAM tables so you probably don't want to mix that the full text search with important data (unless, of course, you have some strange distaste for foreign keys and referential integrity).
The approach I've used in the past is basically this:
id
matches the id
of the thing you're searching.search_text
is everything (body text, title, author name, ...) that you want to search mashed into one blob of text.search_text
string as plain text; this is where you would strip out the HTML and possibly apply some other mappings (such as expanding things like "A+" to something that the full text search will find).This solves your HTML problem, lets you search more than the HTML content easily, and allows you to tune the search results by weighting different components of the search text through repetition (e.g. if you want tags to be more important than the body text, just add the tags two or three times when building the search_text
).
You'll have to process your text to remove or ignore the HTML. This approach lets you do it just once rather than doing it on every search.
I am guessing you want to do the searching on the database? (in that case, stripping HTML tags would mean that you would have to store the content twice).
Try looking into using MYSQLs full text search functionality, in natural language mode.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With