Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene index field value is stripped of all html tags

I have a Lucene index in which one of the fields is mapped to Sitecore's rich text field.

Since this field value contains html content for most of the items sharing the template, I expected html content to be returned when fetching the item's field value. However, I noticed that the value returned is stripped of all html tags.

I tried changing the INDEXTYPE to "UNTOKENTIZED". Yet this did not solve the problem. I understand that Lucene does this to allow searching based on that field. But that is not a requirement in my case and I want this behavior overridden.

like image 880
Sairaj Avatar asked May 24 '16 12:05

Sairaj


1 Answers

It happens because there is a RichTextFieldReader assigned to the html and rich text fields:

<fieldReader 
    fieldTypeName="html|rich text"                                     
    fieldNameFormat="{0}"
    fieldReaderType="Sitecore.ContentSearch.FieldReaders.RichTextFieldReader, Sitecore.ContentSearch" />

In Sitecore 8.1 it's defined in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config.

It strips out all the tags using HtmlField.GetPlainText().

You can try to add another section at the same level as <mapFieldByTypeName hint="raw:AddFieldReaderByFieldTypeName"> section and use something like:

<mapFieldByFieldName hint="AddFieldReaderByFieldName">
    <fieldReader 
        fieldName="yourFieldName"
        fieldReaderType="Sitecore.ContentSearch.FieldReaders.DefaultFieldReader, Sitecore.ContentSearch" />

Mapping by fieldName has higher priority than mapping by field type, so it will use fieldRendered specified for your field instead of using the one specified for the type of your field.

like image 180
Marek Musielak Avatar answered Nov 04 '22 12:11

Marek Musielak