Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using logstash import data to elasticsearch, the speed too slow

Tags:

logstash

all.

I am using the logstash to import some csv files into the elasticsearch and I found the speed is too slow.

the config is :

input {
  stdin{}
}

filter {

csv{

columns=>['date','metric1','id','metric2','country_id','metric3','region_id']
separator=>","
}


  mutate {
    convert => [ "id", "integer" ]
    convert => [ "country_id", "integer" ]
         convert => [ "region_id", "float" ]


}

}

output {

 elasticsearch {
        action => "index"
protocol=>http
host => "10.64.201.***"
        index => "csv_test_data_01"
        workers => 1
    }


  stdout {
codec => rubydebug
 }
}

10.64.201.*** is the master ip address of the elasticsearch cluster and there are three nodes in this cluster.

the csv files are stored in one of these three nodes.

I simply just use command : blablabla -f **.config < csv files

Then it begins to import these csv files into elasticsearch cluster.

But the speed is too slow.

Any better solutions for this case? Or I did something wrong?

like image 587
wuji Avatar asked Nov 21 '25 21:11

wuji


1 Answers

Should start by isolating the problem:

  1. find out if the bottleneck is in the read operation, the logstash parsing, network to elasticsearch or IO limitation on the elasticsearch server.
  2. If the problem is in the read operation, you might want to use a different way to read the CSV file or it might be in the disks and you should consider moving to higher performance disks (check IO usage with iotop)
  3. To verify if logstash is the problem try importing the data without any parsing and see if the performance is better, if it is logstash try to make the parsing more efficient or prepare a better CSV file that is already pre-parsed.
  4. To validate if it is an elasticsearch problem try to print the data to the screen and see if the performance is better. If the problem is in the elasticsearch DB verify the network usage and disk usage to isolate the problem. move to a better network connection or change the disks to higher performance disks to improve Elasticsearch performance.
like image 163
Tom Kregenbild Avatar answered Nov 24 '25 15:11

Tom Kregenbild