I have a very large parquet file which I need to import it to elasticsearch. I searched on the net but could not find useful result. I wonder if latest version of elasticsearch would support such format?
I'm the author of Moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader[parquet]
And then you will be able to load parquet files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident parquet file1.parquet file2.parquet
One way to do it is to use ConvertUtils
and call the convertParquetToCSV()
method.
Then when your CSV file has been generated, you can simply consume it by using Logstash with
file
input, csv
filter and elasticsearch
output.Sample configuration:
input {
file {
path => "/path/to/your/parquet/as/csv/file"
}
}
filter {
csv {
columns => ["col1", "col2"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With