Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load and index files with parquet format to elasticsearch?

I have a very large parquet file which I need to import it to elasticsearch. I searched on the net but could not find useful result. I wonder if latest version of elasticsearch would support such format?

like image 448
ArefehTam Avatar asked Mar 05 '16 14:03

ArefehTam


2 Answers

I'm the author of Moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:

pip install elasticsearch-loader[parquet]

And then you will be able to load parquet files into elasticsearch by issuing:

elasticsearch_loader --index incidents --type incident parquet file1.parquet file2.parquet
like image 100
MosheZada Avatar answered Sep 22 '22 13:09

MosheZada


One way to do it is to use ConvertUtils and call the convertParquetToCSV() method.

Then when your CSV file has been generated, you can simply consume it by using Logstash with

  • a file input,
  • a csv filter and
  • an elasticsearch output.

Sample configuration:

input {
    file {
        path => "/path/to/your/parquet/as/csv/file"
    }
}
filter {
    csv {
        columns => ["col1", "col2"]
    }
}
output {
    elasticsearch {
        hosts => ["localhost:9200"]
    }
}
like image 30
Val Avatar answered Sep 23 '22 13:09

Val