Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Extract Instagram Data

I'm attempting to construct a Microsoft Access database of Instagram accounts, and want to extract the following data, among other things:

  • Account name
  • Number of followers
  • Number of people followed
  • Number of posts (and their dates)
  • Number of likes of picture
  • Number of comments on picture

I don't have any trouble constructing databases but want to know if there is an easier/faster way to get all the information without having to look through each individual picture/account and pick out the info.

Is Microsoft Access the best way to go with this? Are there better solutions?

like image 336
Bryce Edwards Avatar asked Dec 31 '16 05:12

Bryce Edwards


4 Answers

Why not just look at the json data directly with url:

https://www.instagram.com//?__a=1

like image 164
Gordon Avatar answered Nov 03 '22 00:11

Gordon


This repo has it all : https://github.com/rarcega/instagram-scraper

Do read the options properly.

instagram-scraper incindia -m 500 --media-metadata --include-location --media-types none gave me a json which has:

  • a url to image of the media,
  • type of media, number of views,
  • number of likes, number of comments( --comment gives you all the comments too)

and more for me to explore yet.

You can also download all the media

like image 44
fireball.1 Avatar answered Nov 03 '22 02:11

fireball.1


Well if this question has 'web-Scraping' keyword then allow me to share some information here..

Instagram has a JavaScript JSON data in their html source while display the user's information by link, like https://www.instagram.com/user-account/. You can parse these data by any scripting language and can get JSON data.

Instagram shows only 10 Posts once in Single Request, You can see the user's Basic information like user name, biography, no of posts, no of followers and following. But, if we need all likes and comments and all images or likes and comments for each and every photo post. Then we have to click their 'Load more' button.

Load More request a Ajax Call include '?max_id' which gives you next 10 posts information. So you have to create a Post loop to Send/Get rest information until 'max_id' empty or null.

Example Request: First page, https://www.instagram.com/demo-user/

Next Data Request: https://www.instagram.com/demo-user/?max_id=1533276522

and so on...

Recently I had some spare time and I was angry on Instagram ;) So just made a Script to solve all theses problems. This works on PHP and code are well commented, so I don't think this cause any issue to understand the application flow. You can see the script, how it works & can use logic with any other language.

This comes from this GitHub Repository Code

&.. Yes, it doesn't required Instagram API or else.. :)

like image 37
Nono Avatar answered Nov 03 '22 01:11

Nono


You should definitely check out Instagram's API, which can provide you all the public information you would want to scrape. You'll just need to write a script to make the proper API calls (provided below).

From Instagram's website:

We do our best to have all our URLs be RESTful. Every endpoint (URL) may support one of four different http verbs. GET requests fetch information about an object, POST requests create objects, PUT requests update objects, and finally DELETE requests will delete objects.

You'll just need to have the ACCESS-TOKEN value for the relevant account ready when you use the URL in your code, and be able to unpack the json that Instagram returns to you with each GET request. If the data isn't directly available, you can always back it out indirectly. - Account name - Number of followers - Number of people followed

Here's a great starting point: https://www.instagram.com/developer/endpoints/users/#get_users

And here's how you would make a call to an API in python:

#Python 2.7.6
#RestfulClient.py

import requests
from requests.auth import HTTPDigestAuth
import json

# Replace with the correct URL
url = "http://api_url"

# It is a good practice not to hardcode the credentials. So ask the user to enter credentials at runtime
myResponse = requests.get(url,auth=HTTPDigestAuth(raw_input("username: "), raw_input("Password: ")), verify=True)
#print (myResponse.status_code)

# For successful API call, response code will be 200 (OK)
if(myResponse.ok):

    # Loading the response data into a dict variable
    # json.loads takes in only binary or string variables so using content to fetch binary content
    # Loads (Load String) takes a Json file and converts into python data structure (dict or list, depending on JSON)
    jData = json.loads(myResponse.content)

    print("The response contains {0} properties".format(len(jData)))
    print("\n")
    for key in jData:
        print key + " : " + jData[key]
else:
  # If response code is not ok (200), print the resulting http error code with description
    myResponse.raise_for_status()
like image 44
frankbacon322 Avatar answered Nov 03 '22 01:11

frankbacon322