Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python requests lib is not working in amazon aws

I am trying following code:

import requests

headers = {
    'authority': 'www.nseindia.com',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36 OPR/72.0.3815.320',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-GB,en;q=0.9',
}

nse = requests.Session()
x = nse.get("https://www.nseindia.com/", headers=headers)

print(x.text)

Following code is working on my pc but when I put it in aws it is not responding.

I have also checked ping https://www.nseindia.com/ it is working.

requests is working for other sites like google but not working for this specific site on aws.

In EC2:

Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> headers = {
...     'authority': 'www.nseindia.com',
...     'upgrade-insecure-requests': '1',
...     'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36 OPR/72.0.3815.320',
...     'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
...     'sec-fetch-site': 'none',
...     'sec-fetch-mode': 'navigate',
...     'sec-fetch-user': '?1',
...     'sec-fetch-dest': 'document',
...     'accept-language': 'en-GB,en;q=0.9',
... }
>>> nse = requests.Session()
>>> nse.get("https://www.nseindia.com/", headers=headers)

No output from last line.

In my PC:

Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> headers = {
...     'authority': 'www.nseindia.com',
...     'upgrade-insecure-requests': '1',
...     'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36 OPR/72.0.3815.320',
...     'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
...     'sec-fetch-site': 'none',
...     'sec-fetch-mode': 'navigate',
...     'sec-fetch-user': '?1',
...     'sec-fetch-dest': 'document',
...     'accept-language': 'en-GB,en;q=0.9',
... }
>>> nse = requests.Session()
>>> nse.get("https://www.nseindia.com/", headers=headers)
<Response [200]>
>>> 

Problem detected:

IN EC2

ping www.nseindia.com
PING www.nseindia.com (23.9.215.115) 56(84) bytes of data.
64 bytes from a23-9-215-115.deploy.static.akamaitechnologies.com (23.9.215.115): icmp_seq=1 ttl=51 time=1.07 ms
64 bytes from a23-9-215-115.deploy.static.akamaitechnologies.com (23.9.215.115): icmp_seq=2 ttl=51 time=1.09 ms

IN PC

ping www.nseindia.com
PING www.nseindia.com (23.35.32.140) 56(84) bytes of data.
64 bytes from a23-35-32-140.deploy.static.akamaitechnologies.com (23.35.32.140): icmp_seq=1 ttl=57 time=65.8 ms
64 bytes from a23-35-32-140.deploy.static.akamaitechnologies.com (23.35.32.140): icmp_seq=2 ttl=57 time=61.5 ms
64 bytes from a23-35-32-140.deploy.static.akamaitechnologies.com (23.35.32.140): icmp_seq=3 ttl=57 time=73.1 ms

ping to different IP.

like image 773
ooo Avatar asked Dec 03 '20 07:12

ooo


People also ask

Is requests included in AWS Lambda?

The Lambda runtimes for Python 3.8 and later do not include the 'requests' module.

Can you use Python on AWS?

The AWS SDK for Python (Boto3) enables you to use Python code to interact with AWS services like Amazon S3. For example, you can use the SDK to create an Amazon S3 bucket, list your available buckets, and then delete the bucket you just created.


1 Answers

You get different IP after ping because www.nseindia.com is delivered to you through akamai CDN. So you are pinging different edge location whether you are doing this from home/work or AWS servers.

What's more, IP address ranges of AWS are publicly known. Thus, its not uncommon for websites to explicitly block AWS connections, to protect from scraping, attacks or otherwise unwanted access. Thus it seems that nseindia is blocking all these AWS IP addresses. It is a known issue as indicated here and here for examples.

The solution is not to use AWS nor other popular could providers (nseindia also blocks others). You could try to proxy your AWS requests through some commercial VPN maybe, home/work network, or something that is not blacklisted. Sadly, this is try-and-see approach. But you have to also consider potential legal/ethical issues of bypassing these restrictions.

like image 107
Marcin Avatar answered Oct 04 '22 18:10

Marcin