im new to stackoverflow, so if i make a mistake im sorry.
I have to write a python script which collects some data with Elasticsearch and then write the data to a database. I am struggling collecting the data with elasticsearch, because the company i work is behind a proxy.
The script works without a proxy.. but i dont know how to pass down the proxy settings to Elasticsearch.
following code works without a proxy:
es = Elasticsearch(['https://user:[email protected]/elasticsearch'])
res = es.search(index=index, body=request, search_type="count")
i tried the following when i am behind the proxy:
es = Elasticsearch(['https://user:[email protected]/elasticsearch'], _proxy = 'http://proxy.org', _proxy_headers = {'basic_auth': 'user:pw'})
res = es.search(index=index, body=request, search_type="count")
return res
Does anyone know the keywords which i have to pass down Elasticsearch so it uses the proxy?
any help would be nice.
thanks.
To use a proxy in Python, first import the requests package. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you're scraping from.
If your cluster is configured with security explicitly disabled then you can connect via HTTP: from elasticsearch import Elasticsearch # Create the client instance client = Elasticsearch("http://localhost:9200") # Successful response! client.info() # {'name': 'instance-0000000000', 'cluster_name': ...}
Proxy is a structural design pattern that provides an object that acts as a substitute for a real service object used by a client. A proxy receives client requests, does some work (access control, caching, etc.) and then passes the request to a service object.
Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client ( elasticsearch-py ). It provides a more convenient and idiomatic way to write and manipulate queries.
I got an answer on GitHub:
https://github.com/elastic/elasticsearch-py/issues/275#issuecomment-143781969
Thanks a ton again!
from elasticsearch import RequestsHttpConnection
class MyConnection(RequestsHttpConnection):
def __init__(self, *args, **kwargs):
proxies = kwargs.pop('proxies', {})
super(MyConnection, self).__init__(*args, **kwargs)
self.session.proxies = proxies
es = Elasticsearch([es_url], connection_class=MyConnection, proxies = {'https': 'http://user:[email protected]:port'})
print(es.info())
Generally, we don't need to add extra code for proxy, the python low-level module shall be able to use system proxy (i.e. http_proxy
) directly.
In the later release (at least 6.x) we can use requests
module instead of urllib3
to solve this problem nicely, see https://elasticsearch-py.readthedocs.io/en/master/transports.html
# make sure the http_proxy is in system env
from elasticsearch import Elasticsearch, RequestsHttpConnection
es = Elasticsearch([es_url], connection_class=RequestsHttpConnection)
Another possible problem is search
using GET
method as default, it is rejected by my old cache server (squid/3.19), extra parameter send_get_body_as
shall be added, see https://elasticsearch-py.readthedocs.io/en/master/#environment-considerations
from elasticsearch import Elasticsearch
es = Elasticsearch(send_get_body_as='POST')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With