Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

403 Forbidden using Urllib2 [Python]

url = 'https://www.instagram.com/accounts/login/ajax/'
values = {'username' : 'User',
          'password' : 'Pass'}

#'User-agent', ''
data = urllib.urlencode(values)
req = urllib2.Request(url, data,headers={'User-Agent' : "Mozilla/5.0"}) 
con = urllib2.urlopen( req )
the_page = response.read()

Does anyone have any ideas with this? I keep getting the error "403 forbidden". Its possible instagram has something that won't let me connect via python (I don't want to connect via their API). What on earth is going on here, does anyone have any ideas?

Thanks!

EDIT: Adding more info.

The error I was getting was this

This page could not be loaded. If you have cookies disabled in your browser, or you are browsing in Private Mode, please try enabling cookies or turning off Private Mode, and then retrying your action.

I edited my code but am still getting that error.

jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
print len(jar) #prints 0
opener.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36')]
result = opener.open('https://www.instagram.com')
print result.getcode(), len(jar) #prints 200 and 2

url = 'https://www.instagram.com/accounts/login/ajax/'
values = {'username' : 'username',
          'password' : 'password'}

data = urllib.urlencode(values)

response = opener.open(url, data)
print response.getcode()
like image 600
k9b Avatar asked Feb 08 '23 22:02

k9b


1 Answers

Two important things, for starters:

  • make sure you stay on the legal side. According to the Instagram's Terms of Use:

We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).

You must not create accounts with the Service through unauthorized means, including but not limited to, by using an automated device, script, bot, spider, crawler or scraper.

  • there is an Instagram API that would help staying on the legal side and make the life easier. There is a Python client: python-instagram

Aside from that, the Instagram itself is javascript-heavy and you may find it difficult to work with using just urllib2 or requests. If, for some reason, you cannot use the API, you would look into browser automation via selenium. Note that you can automate a headless browser like PhantomJS also. Here is a sample code to log in:

from selenium import webdriver

USERNAME = "username"
PASSWORD = "password"

driver = webdriver.PhantomJS()
driver.get("https://www.instagram.com")

driver.find_element_by_name("username").send_keys(USERNAME)
driver.find_element_by_name("password").send_keys(PASSWORD)

driver.find_element_by_xpath("//button[. = 'Log in']").click()
like image 192
alecxe Avatar answered Feb 13 '23 03:02

alecxe