Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I check for duplicate data on ElasticSearch?

When storing some documents, it should store the nonexistent and ignore the rest (should this be done at application level, maybe checking if document's id already exists, etc.?)

like image 371
Matías Insaurralde Avatar asked Jan 13 '13 03:01

Matías Insaurralde


People also ask

How do I stop Elasticsearch duplicates?

Elasticsearch is a powerful search engine that can be used to search for documents and other data stored in an index. One way to avoid duplicates in Elasticsearch is to use the "dedup" processor, which will remove duplicate documents from the search results.

How do I extract duplicate records?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.


1 Answers

Here is what is stated in documentation:

Operation Type

The index operation also accepts an op_type that can be used to force a create operation, allowing for “put-if-absent” behavior. When create is used, the index operation will fail if a document by that id already exists in the index.

Here is an example of using the op_type parameter:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

Another option to specify create is to use the following uri:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'
like image 173
dadoonet Avatar answered Oct 24 '22 17:10

dadoonet