Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch / Python / Proxy

im new to stackoverflow, so if i make a mistake im sorry.

I have to write a python script which collects some data with Elasticsearch and then write the data to a database. I am struggling collecting the data with elasticsearch, because the company i work is behind a proxy.

The script works without a proxy.. but i dont know how to pass down the proxy settings to Elasticsearch.

following code works without a proxy:

es = Elasticsearch(['https://user:[email protected]/elasticsearch'])
res = es.search(index=index, body=request, search_type="count")

i tried the following when i am behind the proxy:

es = Elasticsearch(['https://user:[email protected]/elasticsearch'], _proxy = 'http://proxy.org', _proxy_headers = {'basic_auth': 'user:pw'})
res = es.search(index=index, body=request, search_type="count")
return res

Does anyone know the keywords which i have to pass down Elasticsearch so it uses the proxy?

any help would be nice.

thanks.

like image 308
meulth Avatar asked Sep 25 '15 08:09

meulth


People also ask

How do I add a proxy to Python?

To use a proxy in Python, first import the requests package. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you're scraping from.

How do I connect Elasticsearch to Python?

If your cluster is configured with security explicitly disabled then you can connect via HTTP: from elasticsearch import Elasticsearch # Create the client instance client = Elasticsearch("http://localhost:9200") # Successful response! client.info() # {'name': 'instance-0000000000', 'cluster_name': ...}

What is proxy in Python?

Proxy is a structural design pattern that provides an object that acts as a substitute for a real service object used by a client. A proxy receives client requests, does some work (access control, caching, etc.) and then passes the request to a service object.

What is Elasticsearch DSL?

Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client ( elasticsearch-py ). It provides a more convenient and idiomatic way to write and manipulate queries.


2 Answers

I got an answer on GitHub:

https://github.com/elastic/elasticsearch-py/issues/275#issuecomment-143781969

Thanks a ton again!

from elasticsearch import RequestsHttpConnection

class MyConnection(RequestsHttpConnection):
    def __init__(self, *args, **kwargs):
        proxies = kwargs.pop('proxies', {})
        super(MyConnection, self).__init__(*args, **kwargs)
        self.session.proxies = proxies

es = Elasticsearch([es_url], connection_class=MyConnection, proxies = {'https': 'http://user:[email protected]:port'})


print(es.info())
like image 156
meulth Avatar answered Sep 29 '22 01:09

meulth


Generally, we don't need to add extra code for proxy, the python low-level module shall be able to use system proxy (i.e. http_proxy) directly.

In the later release (at least 6.x) we can use requests module instead of urllib3 to solve this problem nicely, see https://elasticsearch-py.readthedocs.io/en/master/transports.html

# make sure the http_proxy is in system env
from elasticsearch import Elasticsearch, RequestsHttpConnection
es = Elasticsearch([es_url], connection_class=RequestsHttpConnection)

Another possible problem is search using GET method as default, it is rejected by my old cache server (squid/3.19), extra parameter send_get_body_as shall be added, see https://elasticsearch-py.readthedocs.io/en/master/#environment-considerations

from elasticsearch import Elasticsearch
es = Elasticsearch(send_get_body_as='POST')
like image 39
Larry Cai Avatar answered Sep 29 '22 01:09

Larry Cai