Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch delete documents using logstash and csv

Is there any way to delete documents from ElasticSearch using Logstash and a csv file? I read the Logstash documentation and found nothing and tried a few configs but nothing happened using action "delete"

output {
    elasticsearch{
        action => "delete"
        host => "localhost"
        index => "index_name"
        document_id => "%{id}"
    }
} 

Has anyone tried this? Is there anything special that I should add to the input and filter sections of the config? I used file plugin for input and csv plugin for filter.

like image 467
karina Avatar asked Oct 22 '25 21:10

karina


2 Answers

In addition to Val's answer, I would add that if you have a single input that has a mix of deleted and upserted rows, you can do both if you have a flag that identifies the ones to delete. The output > elasticsearch > action parameter can be a "field reference," meaning that you can reference a per-row field. Even better, you can change that field to a metadata field so that it can be used in a field reference without being indexed.

For example, in your filter section:

filter {
    # [deleted] is the name of your field
    if [deleted] {
        mutate {    
            add_field => {
                "[@metadata][elasticsearch_action]" => "delete"
            }
        }
        mutate {
            remove_field => [ "deleted" ]
        }
    } else {
        mutate {    
            add_field => {
                "[@metadata][elasticsearch_action]" => "index"
            }
        }
        mutate {
            remove_field => [ "deleted" ]
        }
    }   
}

Then, in your output section, reference the metadata field:

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "myindex"
        action => "%{[@metadata][elasticsearch_action]}"
        document_type => "mytype"
    }
}
like image 69
anon Avatar answered Oct 25 '25 11:10

anon


It is definitely possible to do what you suggest, but if you're using Logstash 1.5, you need to use the transport protocol as there is a bug in Logstash 1.5 when doing deletes over the HTTP protocol (see issue #195)

So if your delete.csv CSV file is formatted like this:

id
12345
12346
12347

And your delete.conf Logstash config looks like this:

input {
    file {
        path => "/path/to/your/delete.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
filter {
    csv {
        columns => ["id"]
    }
}
output {
    elasticsearch{
        action => "delete"
        host => "localhost"
        port => 9300                         <--- make sure you have this
        protocol => "transport"              <--- make sure you have this
        index => "your_index"                <--- replace this
        document_type => "your_doc_type"     <--- replace this
        document_id => "%{id}"
    }
}

Then when running bin/logstash -f delete.conf you'll be able to delete all the documents whose id is specified in your CSV file.

like image 28
Val Avatar answered Oct 25 '25 11:10

Val