Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing of a large json file to elasticsearch

Below is how my rdns.json file looks like, it has around 1 billion records. I have tried several ways to import the file, but failed badly.

{"timestamp":"1573629372","name":"1.10.178.205","hostname":"node-a19.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573636816","name":"1.10.178.206","hostname":"node-a1a.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573647966","name":"1.10.178.207","hostname":"node-a1b.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573650758","name":"1.10.178.208","hostname":"node-a1c.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573660230","name":"1.10.178.209","hostname":"node-a1d.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573652982","name":"1.10.178.21","hostname":"node-9w5.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573614753","name":"1.10.178.210","hostname":"node-a1e.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573616716","name":"1.10.178.211","hostname":"node-a1f.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573626432","name":"1.10.178.212","hostname":"node-a1g.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573611374","name":"1.10.178.213","hostname":"node-a1h.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573655790","name":"1.10.178.214","hostname":"node-a1i.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573635098","name":"1.10.178.215","hostname":"node-a1j.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573628481","name":"1.10.178.216","hostname":"node-a1k.pool-1-10.dynamic.totinternet.net","type":"ptr"}

Could someone please guide me how i can import the file to elasticsearch.

like image 305
iqzer0 Avatar asked Oct 17 '25 03:10

iqzer0


2 Answers

Nothing like using a native way to upload file to elasticsearch but have you considered using nodejs streams, newline delimited json and etl to do a bulk operation to elasticsearch while streaming. Basically something like

const es = require("elasticsearch");
const etl = require("etl");
const ndjson = require("ndjson");
const fs = require("fs");

const esClient = new es.Client({
  "log": "trace"
});

fs.createReadStream(`${__dirname}/test.json`)
  .pipe(ndjson.parse()) // parse the new line delimited json
  .pipe(etl.collect(10)) // This could be anything depending on your single document size and elasticsearch cluster configuration
  .pipe(etl.elastic.index(esClient, "someindex", "someType")) // bulk operation
  .promise()
  .then(res => console.log(res))
  .catch(err => console.log(err));
like image 98
Ashish Modi Avatar answered Oct 18 '25 16:10

Ashish Modi


The solution was to use elasticsearch_loader

It handled my file which was 128GB very nicely and imported it without the needing of doing any formatting to the file. The command i used was

elasticsearch_loader --index rdns --type dnsrecords json rdns.json --lines

Do note it takes a quiet amount of time to post the data though..

like image 24
iqzer0 Avatar answered Oct 18 '25 18:10

iqzer0



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!