I am writing a Dexterity content type which contains plain text and HTML fields. I want to have a custom SearchableText() method which exposes these fields to portal_catalog and Plone full text search.
I assume for plain text I can just do string join with spaces. But how I should preprocess HTML content when exposing it in SearchableText()?
for converting data in plone there is a tool called portal_transforms, which is quite intelligent in converting stuff (depending on your os / installation it may also be able to convert .doc, .pdf etc.):
from Products.CMFCore.utils import getToolByName
transforms = getToolByName(self.context, 'portal_transforms')
stream = transforms.convertTo('text/plain', html, mimetype='text/html')
text = stream.getData().strip()
for indexing fields in dexterity I propose to use collective.dexteritytextindexer (but there is no TTW support at the moment). -> http://pypi.python.org/pypi/collective.dexteritytextindexer -> https://github.com/collective/collective.dexteritytextindexer
cheers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With