I have an AWS CloudSearch instance that I am still developing. At times, such as when I make some modification to the format of a field, I find myself wanting to wipe out all of the data and regenerating it. Is there any way to clear out all of the data using the console, or do I have to go about it by programatic means? If I do have to use programatic means (i.e. generate and POST a bunch of "delete" SDF files) is there any good way to query for all documents in a CloudSearch instance? I guess I could just delete and re-create the instance, but thattakes a while, and loses all of the indexes/rank expressions/text options/etc

On my side, I used a local nodejs script like this: <pre class="prettyprint"><code>var AWS = require('aws-sdk'); AWS.config.update({ accessKeyId: '<your AccessKey>', secretAccessKey: '<Your secretAccessKey>', region: '<your region>', endpoint: '<your CloudSearch endpoint' }); var params = { query:"(or <your facet.FIELD:'<one facet value>' facet.FIELD:'<one facet value>')", queryParser:'structured' }; var cloudsearchdomain = new AWS.CloudSearchDomain(params); cloudsearchdomain.search(params, function(err, data) { var fs = require('fs'); var result = []; if (err) { console.log("Failed"); console.log(err); } else { resultMessage = data; for(var i=0;i<data.hits.hit.length;i++){ result.push({"type":"delete","id":data.hits.hit[i].id}); } fs.writeFile("delete.json", JSON.stringify(result), function(err) { if(err) {return console.log(err);} console.log("The file was saved!"); }); } }); </code></pre> You have to know at least all the values of on facets, to be able to request all IDs. In my code, I just put 2 (in (or ....) section), but you can have more. Once it is done, you have one delete.json file to be used with AWS-CLI using this command : <pre class="prettyprint"><code>aws cloudsearchdomain upload-documents --documents delete.json --content-type application/json --endpoint-url <your CloudSearch endpoint> </code></pre> ... that did the job for me !

How to clear all data from AWS CloudSearch?

3 Answers

Using aws and jq from the command line (tested with bash on mac):

export CS_DOMAIN=https://yoursearchdomain.yourregion.cloudsearch.amazonaws.com

# Get ids of all existing documents, reformat as
# [{ type: "delete", id: "ID" }, ...] using jq
aws cloudsearchdomain search \
  --endpoint-url=$CS_DOMAIN \
  --size=10000 \
  --query-parser=structured \
  --search-query="matchall" \
  | jq '[.hits.hit[] | {type: "delete", id: .id}]' \
  > delete-all.json

# Delete the documents
aws cloudsearchdomain upload-documents \
  --endpoint-url=$CS_DOMAIN \
  --content-type='application/json' \
  --documents=delete-all.json

For more info on jq see Reshaping JSON with jq

Update Feb 22, 2017

Added --size to get the maximum number of documents (10,000) at a time. You may need to repeat this script multiple times. Also, --search-query can take something more specific, if you want to be selective about the documents getting deleted.

163

answered Sep 30 '22 01:09

Kevin Tonon

Best answer I've been able to find was somewhat buried in the AWS docs. To wit:

Amazon CloudSearch currently does not provide a mechanism for deleting all of the documents in a domain. However, you can clone the domain configuration to start over with an empty domain. For more information, see Cloning an Existing Domain's Indexing Options.

Source: http://docs.aws.amazon.com/cloudsearch/latest/developerguide/Troubleshooting.html#ts.cleardomain

answered Sep 29 '22 23:09

biggusjimmus

On my side, I used a local nodejs script like this:

var AWS = require('aws-sdk');

AWS.config.update({
    accessKeyId: '<your AccessKey>', 
    secretAccessKey: '<Your secretAccessKey>',
    region: '<your region>',
    endpoint: '<your CloudSearch endpoint'
});

var params = {
       query:"(or <your facet.FIELD:'<one facet value>' facet.FIELD:'<one facet value>')",
       queryParser:'structured'
};


var cloudsearchdomain = new AWS.CloudSearchDomain(params);
cloudsearchdomain.search(params, function(err, data) {
    var fs = require('fs');
    var result = [];
    if (err) {
        console.log("Failed");
        console.log(err);
    } else {
        resultMessage = data;
        for(var i=0;i<data.hits.hit.length;i++){
            result.push({"type":"delete","id":data.hits.hit[i].id});
        }    

        fs.writeFile("delete.json", JSON.stringify(result), function(err) {
            if(err) {return console.log(err);}
        console.log("The file was saved!");
        });
    }
});

You have to know at least all the values of on facets, to be able to request all IDs. In my code, I just put 2 (in (or ....) section), but you can have more.

Once it is done, you have one delete.json file to be used with AWS-CLI using this command :

aws cloudsearchdomain upload-documents --documents delete.json --content-type application/json --endpoint-url <your CloudSearch endpoint>

... that did the job for me !

answered Sep 29 '22 23:09

Arnaduga

Related questions
                            
                                How can I access S3/S3n from a local Hadoop 2.6 installation?
                            
                                How to redirect HTTP to HTTPS using S3, Cloudfront, and Route 53 using naked domains?
                            
                                I need an Amazon S3 user with full access to a single bucket
                            
                                Unable to delete cfn stack, role is invalid or cannot be assumed
                            
                                Is there a way to have index.html functionality with content hosted on S3?
                            
                                AWS Cognito username/email login is case-sensitive
                            
                                Downloading the latest file in an S3 bucket using AWS CLI? [duplicate]
                            
                                boto3 equivalent to boto.utils.get_instance_metadata()?
                            
                                AWS Lambda NoClassDefFoundError
                            
                                Can I test AWS Glue code locally?
                            
                                AWS Lambda Error: Unzipped size must be smaller than 262144000 bytes
                            
                                Setting the capability for aws cloudformation template-validate
                            
                                Difference between S3 and Redshift (AWS) [closed]
                            
                                Customize AWS ElasticBeanstalk NodeJS Install (use yarn)
                            
                                What does terraform refresh really do?
                            
                                Mounting a NVME disk on AWS EC2
                            
                                How to get the instance Name from the instance in AWS?
                            
                                how to configure eb cli with eb env that is already running
                            
                                Is there a good object mapper for Amazons dynamodb(through aws sdk) which can be used in nodejs?
                            
                                How do I change the publicly accessible option for Amazon RDS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to clear all data from AWS CloudSearch?

Tags:

amazon-web-services

amazon-cloudsearch

biggusjimmus

People also ask

3 Answers

Kevin Tonon

biggusjimmus

Arnaduga

Recent Activity

Donate For Us