Ways to only process new(index after last run) data in Elasticsearch?

Tags:

elasticsearch

Is there a way to get the date and time that an elastic search document was written?

I am running es queries via spark and would prefer NOT to look through all documents that I have already processed. Instead I would like read the only documents that were ingested between the last time the program ran and now.

What is the best most efficient way to do this?

I have looked at;

updating to add a field with an array with booleans for if its been looked at by which analytic. The negative is waiting for the update to occur.
index per time frame method, which would be to break down the current indexes into smaller ones so by hour.The negative I see is the number of open file descriptors.
??

Elasticsearch version 5.6

738

asked Dec 11 '17 19:12

SparkleGoat

1 Answers

I posted the question on the elasticsearch discussion board and it appears using the ingest pipeline is the best option.

169

answered Nov 12 '22 01:11

SparkleGoat

Related questions
                            
                                How do I Unit test/mock ElasticSearch
                            
                                Validation Failed: 1: mapping type is missing; in elasticsearch
                            
                                How to access nested property at script_fields on elastic 5.x
                            
                                ElasticSearch: How to use filter_path parameter in POST body
                            
                                Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector
                            
                                How to solve "Error connecting to SMTP host: [Errno 10061] No connection could be made because the target machine actively refused it''?
                            
                                Elasticsearch syntax error with multi fields search
                            
                                Searching for Gray Log 2 API or a way to query ElasticSearch
                            
                                Understanding the write_consistency and quorum rule of Elasticsearch
                            
                                Shingles in Elasticsearch, why does this example with custom analyzer fail?
                            
                                Elasticsearch Cluster - No known master node, scheduling a retry
                            
                                Elasticsearch: how to scope aggregations to your query and filter?
                            
                                Kibana: joining two documents in table visualization
                            
                                Elastic Search: use filter and should bool query
                            
                                Elasticsearch : Meaning of "@" symbol
                            
                                Spring Elasticsearch - None of the configured nodes are available
                            
                                spring data elasticsearch: settings and mapping config with annotations not working
                            
                                Using elasticsearch-river-mysql to stream data from MySQL database to Elasticsearch
                            
                                Trouble with spaces in query to elasticsearh
                            
                                Confusions about the Elasticsearch json dsl query structure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ways to only process new(index after last run) data in Elasticsearch?

Tags:

elasticsearch

SparkleGoat

People also ask

1 Answers

SparkleGoat

Recent Activity

Donate For Us