I'm using Elasticsearch, and writing my own wrapper using WebRequest since NEST (the usual choice) bafflingly seems to lack the ability to insert an item and have the generated ID returned.
Anyway - no problems with the general method. But, any HTML content is indexed as-is, i.e. if I have <strong>test</strong>
in a field, then a search for the query "strong" returns the item.
I've put this in elasticsearch.yml, based on a random message board post I found:
index:
analysis:
analyzer:
htmlContentAnalyzer:
type: custom
tokenizer: standard
filter: standard
char_filter: html_strip
Then, I create an mapping thusly for my index 'content', item type 'news':
PUT http://localhost:9200/content/news/_mapping
{
"news" : {
"properties" : {
"TextContent" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "htmlContentAnalyzer",
"store" : "yes"
}
}
}
}
}
The store
/yes
is just for "fun", it makes no difference. The above gives me a 200 OK.
However, the search returns the same results.
What doesn't help is that elasticsearch documentation seems appalling. Check out this page:
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html
it gives you a brief rundown of what mapping is, and says more details are in the mapping section, i.e. this page:
http://www.elasticsearch.org/guide/reference/mapping/
...which seems to be truly terrible. There's nothing referring to the format/object graph I found - no mention of "properties", "type", "analyzer", "index" etc. There are some sections on the menu on the right, e.g. "_index", but they seem to refer to the item as a whole? And where is that pointed out?
So my question is on two fronts:
With all credit to chrismale on #elasticsearch (freenode IRC) -
Searching against _all
is no good: that is indexed with its own analyzer. Querying on my TextContent
field specifically worked as expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With