Instagram used to expose open data as json under the endpoint https://www.instagram.com/<username>/?__a=1
. This changed over night, the endpoint is not available anymore. What is the new endpoint or what could be an alternative to this?
Thanks in advance!
You can create a session like instagram-scraper package does.
You don't need to provide a username and password. Below snippet will create an anonymous session.
import requests
import json
try:
from urllib.parse import urlparse
except ImportError:
from urlparse import urlparse
BASE_URL = 'https://www.instagram.com/'
CHROME_WIN_UA = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
session = requests.Session()
session.headers = {'user-agent': CHROME_WIN_UA, 'Referer': BASE_URL}
session.cookies.set('ig_pr', '1')
req = session.get(BASE_URL)
session.headers.update({'X-CSRFToken': req.cookies['csrftoken']})
url = "https://www.instagram.com/instagram/?__a=1"
response = session.get(url, cookies="", headers={'Host': urlparse(url).hostname}, stream=False, timeout=90)
print(response.json())
https://github.com/arc298/instagram-scraper
In case you're seeking for the regex:
<script type="text\/javascript">window[.]_sharedData = {[\s\S]*};<\/script>
The endpoint does not exist anymore. Facebook is restricting APIs because of scandals. The data is still there of course, Instagram's frontend needs it, so the alternative right now is to scrape the page and find the json data there. Here is how I do it:
https://www.instagram.com/<username>
.script
tag which text's starts with window._sharedData =
. You can use regular expressions or a scraping library for this.;
at the end) is the json data you want.Here is an example using Python:
import requests
from bs4 import BeautifulSoup
import re
import json
r = requests.get('https://www.instagram.com/github/')
soup = BeautifulSoup(r.content)
scripts = soup.find_all('script', type="text/javascript", text=re.compile('window._sharedData'))
stringified_json = scripts[0].get_text().replace('window._sharedData = ', '')[:-1]
json.loads(stringified_json)['entry_data']['ProfilePage'][0]
Out[1]:
{u'graphql': {u'user': {u'biography': u'How people build software.',
u'blocked_by_viewer': False,
...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With