Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating DataFrame from ElasticSearch Results

I am trying to build a DataFrame in pandas, using the results of a very basic query to Elasticsearch. I am getting the Data I need, but its a matter of slicing the results in a way to build the proper data frame. I really only care about getting the timestamp, and path, of each result. I have tried a few different es.search patterns.

Code:

from datetime import datetime from elasticsearch import Elasticsearch from pandas import DataFrame, Series import pandas as pd import matplotlib.pyplot as plt es = Elasticsearch(host="192.168.121.252") res = es.search(index="_all", doc_type='logs', body={"query": {"match_all": {}}}, size=2, fields=('path','@timestamp')) 

This gives 4 chunks of data. [u'hits', u'_shards', u'took', u'timed_out']. My results are inside the hits.

res['hits']['hits'] Out[47]:  [{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',   u'_index': u'logstash-2014.08.07',   u'_score': 1.0,   u'_type': u'logs',   u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',    u'path': u'app2.log'}},  {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',   u'_index': u'logstash-2014.08.07',   u'_score': 1.0,   u'_type': u'logs',   u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',    u'path': u'app1.log'}}] 

The only things I care about, are getting the timestamp, and path for each hit.

res['hits']['hits'][0]['fields'] Out[48]:  {u'@timestamp': u'2014-08-07T12:36:00.086Z',  u'path': u'app1.log'} 

I can not for the life of me figure out who to get that result, into a dataframe in pandas. So for the 2 results I have returned, I would expect a dataframe like.

   timestamp                   path 0  2014-08-07T12:36:00.086Z    app1.log 1  2014-08-07T12:36:00.200Z    app2.log 
like image 444
Justin S Avatar asked Aug 07 '14 15:08

Justin S


2 Answers

Or you could use the json_normalize function of pandas :

from pandas.io.json import json_normalize df = json_normalize(res['hits']['hits']) 

And then filtering the result dataframe by column names

like image 187
Brown nightingale Avatar answered Oct 12 '22 05:10

Brown nightingale


Better yet, you can use the fantastic pandasticsearch library:

from elasticsearch import Elasticsearch es = Elasticsearch('http://localhost:9200') result_dict = es.search(index="recruit", body={"query": {"match_all": {}}})  from pandasticsearch import Select pandas_df = Select.from_dict(result_dict).to_pandas() 
like image 28
Phil B Avatar answered Oct 12 '22 06:10

Phil B