Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch & attachment type (NEST C#)

I'm trying to index a pdf document with elasticsearch/NEST.

The file is indexed but search results returns with 0 hits.

I need the search result to return only the document Id and the highlight result

(without the base64 content)

Here is the code:

I'll appreciate any help here,

Thanks,

class Program
{
    static void Main(string[] args)
    {
        // create es client
        string index = "myindex";

        var settings = new ConnectionSettings("localhost", 9200)
            .SetDefaultIndex(index);
        var es = new ElasticClient(settings);

        // delete index if any
        es.DeleteIndex(index);

        // index document
        string path = "test.pdf";
        var doc = new Document()
        {
            Id = 1,
            Title = "test",
            Content = Convert.ToBase64String(File.ReadAllBytes(path))
        };

        var parameters = new IndexParameters() { Refresh = true };
        if (es.Index<Document>(doc, parameters).OK)
        {
            // search in document
            string query = "semantic"; // test.pdf contains the string "semantic"

            var result = es.Search<Document>(s => s
                .Query(q =>
                    q.QueryString(qs => qs
                        .Query(query)
                    )
                )
                .Highlight(h => h
                    .PreTags("<b>")
                    .PostTags("</b>")
                    .OnFields(
                      f => f
                        .OnField(e => e.Content)
                        .PreTags("<em>")
                        .PostTags("</em>")
                    )
                )
            );

            if (result.Hits.Total == 0)
            {
            }
        }
    }
}

[ElasticType(
    Name = "document",
    SearchAnalyzer = "standard",
    IndexAnalyzer = "standard"
)]
public class Document
{
    public int Id { get; set; }

    [ElasticProperty(Store = true)]
    public string Title { get; set; }

    [ElasticProperty(Type = FieldType.attachment,
        TermVector = TermVectorOption.with_positions_offsets)]
    public string Content { get; set; }
}
like image 426
Yossi Cohen Avatar asked Feb 08 '13 21:02

Yossi Cohen


People also ask

What is Elasticsearch is used for?

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

Why use Elasticsearch instead of SQL?

You want Elasticsearch when you're doing a lot of text search, where traditional RDBMS databases are not performing really well (poor configuration, acts as a black-box, poor performance). Elasticsearch is highly customizable, extendable through plugins. You can build robust search without much knowledge quite fast.

Is Elasticsearch SQL or NoSQL?

Since its release in 2010, Elasticsearch has become one of the world's top ten databases by popularity. Originally based on Apache's Lucene search engine, it remains an open-source product, built using Java, and storing data in an unstructured NoSQL format.

What is Elasticsearch and how does it work?

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.


1 Answers

Install the Attachment Plugin and restart ES

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/2.3.2

Create an Attachment Class that maps to the Attachment Plugin Documentation

  public class Attachment
  {
      [ElasticProperty(Name = "_content")]
      public string Content { get; set; }

      [ElasticProperty(Name = "_content_type")]
      public string ContentType { get; set; }

      [ElasticProperty(Name = "_name")]
      public string Name { get; set; }
  }

Add a property on the Document class you are indexing with the name "File" and correct mapping

  [ElasticProperty(Type = FieldType.Attachment, TermVector = TermVectorOption.WithPositionsOffsets, Store = true)]
  public Attachment File { get; set; }

Create your index explicitly before you index any instances of your class. If you don't do this, it will use dynamic mapping and ignore your attribute mapping. If you change your mapping in the future, always recreate the index.

  client.CreateIndex("index-name", c => c
     .AddMapping<Document>(m => m.MapFromAttributes())
  );

Index your item

  string path = "test.pdf";

  var attachment = new Attachment();
  attachment.Content = Convert.ToBase64String(File.ReadAllBytes(path));
  attachment.ContentType = "application/pdf";
  attachment.Name = "test.pdf";

  var doc = new Document()
  {
      Id = 1,
      Title = "test",
      File = attachment
  };
  client.Index<Document>(item);

Search on the File property

  var query = Query<Document>.Term("file", "searchTerm");

  var searchResults = client.Search<Document>(s => s
          .From(start)
          .Size(count)
          .Query(query)
  );
like image 199
mmols Avatar answered Oct 06 '22 21:10

mmols