Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to manually set an ElasticSearch document id when inserting via AWS Kinesis Firehose?

I have an AWS Kinesis Firehose Stream set up to feed data into an AWS ElasticSearch cluster, and I can successfully insert documents by sending them to the Firehose Stream, which loads them into ElasticSearch.

But I would like to be able to manually specify/set a document's id value when sending it off to the Firehose Stream. I'm successfully using the AWS PHP SDK to send data to Firehose, I just can't figure out if there's a way to manually set a document's id.

$firehoseParams = [
    'DeliveryStreamName' => 'myStreamName', // REQUIRED
    'Record' => [ // REQUIRED
        'Data' => '{"json_encoded": "data", ...}', // REQUIRED
    ],
];
$firehoseResult = $this->_firehoseClient->putRecord($firehoseParams);

I've tried setting id, _id, and esDocumentId values in the JSON data, all to no avail.

Anyone have any ideas?

like image 851
Alex Coleman Avatar asked Aug 31 '25 04:08

Alex Coleman


1 Answers

Firehose Delivery Stream destinations are append-only and in the case of Opensearch (AWS Elasticsearch), do not support upsert. Firehose will generate a unique ID for each record it streams and use that as the document ID. This cannot be user-configured at this time. If you are an AWS Enterprise Support customer, you can request this feature be added to Firehose by talking with your Solution Architect (SA) or Technical Account Manager (TAM).

One possible short-term solution is to use a Kinesis Stream and trigger a Lambda function to upsert documents to Opensearch using the Opensearch APIs. The Python client would push JSON data to the Kinesis Stream, and rather than having the Lambda function only perform transformations, it would trigger for records in the stream, perform the transformation, and handle upserting to Opensearch.

like image 193
Kernel0100bin Avatar answered Sep 02 '25 16:09

Kernel0100bin