Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get instagram's video media source using oembed endpoints

THE CONTEXT

I have a piece of (jQuery) ajax code that has been happily working for about 9 months until the last couple of weeks or so.

This code uses Instagram's embedding endpoints that allows me to get the media source (image or video) out of a normal Instagram link like http://instagram.com/p/BUG/ regardless the user and without needing an access_token.

Simplified example :

var URL = "http://api.instagram.com/oembed?url=http://instagram.com/p/BUG/";
$(document).ready(function () {
    $.ajax({
        url: URL,
        dataType: "jsonp",
        cache: false,
        success: function (response) {
            console.log(response.url);
        },
        error: function () {
            console.log("couldn't process the instagram url");
        }
    });
});

In the code above, response.url would return the full media URL source like :

http://photos-a.ak.instagram.com/xxxx/1234_123456123_123456_n.jpg // image or
http://distilleryvesper3-15.ak.instagram.com/b0c957463548362858_101.mp4 // video

Then I could use the returned URL to embed the media file in my webpage.

NOTE :

Since the idea is to get the URL source of any Instagram link regardless the user, using media endpoints is not an option.


THE ISSUE

Instagram's oembed endpoints allows you to GET a json response, which until the last couple of weeks had this structure :

{
    "provider_url" : "http:\/\/instagram.com\/",
    "media_id" : "123456789_123456789",
    "title" : "the title",
    "url" : "http:\/\/photos-a.ak.instagram.com\/hphotos-ak-xfp1\/12345678_123456789012345_1234567890_n.jpg",
    "author_name" : "{the user name}",
    "height" : 640,
    "width" : 640,
    "version" : "1.0",
    "author_url" : "http:\/\/instagram.com\/{the user name}",
    "author_id" : 123456789,
    "type" : "photo",
    "provider_name" : "Instagram"
}

As you may noticed, my ajax code was particularly interested in the property name url, which contains the full media's URL.

Notice that this json response (as today) is still valid for Instagram images, however, if the original Instagram's link is a video, let's use a real example : http://instagram.com/p/mOFsFhAp4f/ (CocaCola(c)) the json response doesn't return any url key anymore.

It seems that after the introduction of web embeds Instagram has decided to replace the key url by a html property in their (oembed) json response for videos only, which contains the iframe to embed like :

{
    ...

    "html" : "\u003ciframe src=\"http:\/\/instagram.com\/p\/BUG\/embed\" width=\"616\" height=\"716\" frameborder=\"0\" scrolling=\"no\" allowtransparency=\"true\"\u003e\u003c\/iframe\u003e",

    ...
}

... and of course, that breaks my code since response.url is undefined.


THE QUESTION

How do I get the full video's URL after the changes in the Instagram json response?

Unfortunately I couldn't find any proper documentation or a change log in Instagram's developers site (they have a great API but poor documentation.)

Please notice that the question is about Instagram API (v1) embedding endpoints rather than a jQuery or ajax question.

I am looking for (an undocumented perhaps) Instagram's API option, endpoint, oembed or else (that doesn't require access_token) that allows me to retrieve the direct link to the media video (after a json response preferably) out of a normal Instagram link regardless the user ...or willing to consider a not too hacky workaround.

like image 776
JFK Avatar asked Jul 03 '14 21:07

JFK


2 Answers

This may not be the best or optimum answer , but as i believe this will solve your issue for now , so you may consider it a work around:

Thanks to whateverorigin.org service we are able to fetch cross origin json , which has all the data you may request , all you have to do is converting the returned object to string , then use regex to fetch whatever data you need.

var myvideourl="http://instagram.com/p/mOFsFhAp4f/"
$.ajaxSetup({
    scriptCharset: "utf-8", //maybe "ISO-8859-1"
    contentType: "application/json; charset=utf-8"
});

$.getJSON('http://whateverorigin.org/get?url=' + 
    encodeURIComponent(myvideourl) + '&callback=?',
    function(data) {

        var xx=data.contents
        var dataindex=xx.search('<meta property="og:video" content=')
        var end=xx.indexOf('/>', dataindex);
        var yy=xx.slice(dataindex,end+2)
        var metaobject=$.parseHTML(yy)
        alert(metaobject[0].content)
        console.log(metaobject[0].content)
});

Here is and example:

JS Fiddle Demo

works well for me , but only tried it on the CocaCola video , havent tried it on other links.

like image 85
ProllyGeek Avatar answered Oct 04 '22 22:10

ProllyGeek


UPDATE [March 2015] : For an extended and updated version of this solution, please visit http://www.picssel.com/build-a-simple-instagram-api-case-study/


@ProllyGeek's answer provided a good workaround to scrape the Instagram video page (well deserved bounty), however it relies on the whateverorigin.org third-party service, which will work fine unless the service eventually becomes unavailable.

Since the latest already happened to me in a production environment, I had to look for a more reliable alternative so I decided to use php's file_get_contents to scrape the video link from an own-hosted PHP module.

I basically followed the same logic proposed by @ProllyGeek but translated to PHP so:

The getVideoLink.php module :

<?php
header('Content-Type: text/html; charset=utf-8');
function clean_input($data){
    $data = trim($data);
    $data = stripslashes($data);
    $data = strip_tags($data);
    $data = htmlspecialchars($data);
    return $data;
};
$instalink = clean_input( $_GET['instalink'] );    
if (!empty($instalink)) {
    $response = clean_input( @ file_get_contents( $instalink ) );
    $start_position = strpos( $response ,'video_url&quot;:&quot;' ); // the start position
    $start_positionlength = strlen('video_url&quot;:&quot;'); // string length to trim before
    $end_position = strpos($response ,'&quot;,&quot;usertags'); // the end position
    $mp4_link = substr( $response, ( $start_position + $start_positionlength ), ( $end_position - ( $start_position + $start_positionlength ) ) );
    echo $mp4_link;
};
?>

Of course, you may need to analyze the response manually to know what you are looking for.

Then the AJAX call to the PHP module from my main page :

var instaLink = "http://instagram.com/p/mOFsFhAp4f/"; // the Coca Cola video link
jQuery(document).ready(function ($) {
    $.ajax({
        url: "getVideoLink.php?instalink="+instaLink,
        dataType : "html",
        cache : false,
        success : function (data) {
            console.log(data); // returns http://distilleryvesper3-15.ak.instagram.com/b0ce80e6b91111e3a16a122b8b9af17f_101.mp4
        },
        error : function () {
            console.log("error in ajax");
        }
    });
}); // ready 

It's assumed your host supports php to use this method.


EDIT [November 19, 2014]

I have modified the getVideoLink.php module (now getInstaLinkJSON.php) to actually get the JSON information from an specific Instagram media link like http://instagram.com/p/mOFsFhAp4f/

This is much more useful than just scraping the video's URL and can be used for images too.

The new getInstaLinkJSON.php code :

<?php
function clean_input($data){
    $data = trim($data);
    $data = strip_tags($data);
    return $data;
};
// clean user input
function clean_input_all($data){
    $data = trim($data);
    $data = stripslashes($data);
    $data = strip_tags($data);
    $data = htmlspecialchars($data);
    return $data;
};
$instaLink = clean_input_all( $_GET['instaLink'] );

if( !empty($instaLink) ){
    header('Content-Type: application/json; charset=utf-8');
    $response = clean_input( @ file_get_contents($instaLink) );
    $response_length = strlen($response);
    $start_position = strpos( $response ,'window._sharedData = ' ); // the start position
    $start_positionlength = strlen('window._sharedData = '); // string length to trim before
    $trimmed = trim( substr($response, ( $start_position + $start_positionlength ) ) ); // trim extra spaces and carriage returns
    $jsondata = substr( $trimmed, 0, -1); // remove extra ";" added at the end of the javascript variable 
    echo $jsondata;
} elseif( empty($instaLink) ) {
    die(); //only accepts instaLink as parameter
}
?>

I am sanitizing both the user's input and the file_get_contents() response, however I am not stripping slashes or HTML characters from the last since I will be returning a JSON response.

Then the AJAX call:

var instaLink = "http://instagram.com/p/mOFsFhAp4f/"; // demo
jQuery.ajax({
    url: "getInstaLinkJSON.php?instalink=" + instaLink,
    dataType : "json", // important!!!
    cache : false,
    success : function ( response ) {
        console.log( response ); // returns json
        var media = response.entry_data.DesktopPPage[0].media;

        // get the video URL
        // media.is_video : returns true/false

        if( media.is_video ){
            console.log( media.video_url ); // returns http://distilleryvesper3-15.ak.instagram.com/b0ce80e6b91111e3a16a122b8b9af17f_101.mp4
        }
    },
    error : function () {
        console.log("error in ajax");
    }
});

EDIT [May 20, 2020]

currently working PHP

<?php
header("Access-Control-Allow-Origin: *");
header("Access-Control-Allow-Headers: *");
function clean_input($data){
    $data = trim($data);
    $data = strip_tags($data);
    return $data;
};
// clean user input
function clean_input_all($data){
    $data = trim($data);
    $data = stripslashes($data);
    $data = strip_tags($data);
    $data = htmlspecialchars($data);
    return $data;
};
$instaLink = clean_input_all( $_GET['instaLink'] );

if( !empty($instaLink) ){
    header('Content-Type: application/json; charset=utf-8');
    $response = clean_input( @ file_get_contents($instaLink) );
    $response_length = strlen($response);
    $start_position = strpos( $response ,'window._sharedData = ' ); // the start position
    $start_positionlength = strlen('window._sharedData = '); // string length to trim before
    $trimmed = trim( substr($response, ( $start_position + $start_positionlength ) ) ); // trim extra spaces and carriage returns
    $jsondata = substr( $trimmed, 0, -1); // remove extra ";" added at the end of the javascript variable 
    $jsondata = explode('window.__initialDataLoaded', $jsondata);
    echo substr(trim($jsondata[0]), 0, -1);
} elseif( empty($instaLink) ) {
    die(); //only accepts instaLink as parameter
}
?>
like image 31
JFK Avatar answered Oct 05 '22 00:10

JFK