Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading *.mp4 files with Python

I'm trying to download and save lecture videos from a website. While I've been successful in downloading the files, they won't play in my media player. Here is the code I'm using:

from bs4 import BeautifulSoup
import re
import urllib2

snippet = open('Python/SNA Page Source Revised.txt', 'r')
soup = BeautifulSoup(snippet)

links = [link.get('href') for link in soup.find_all('a')]

videos = []

for link in links:
  match = re.search('.*mp4.*', link)
  if match:
    videos.append(link)

vidNum = 1

for video in videos:
  f = urllib2.urlopen(video)
  with open('Data Analysis/Social Network Analysis/Video '+vidNum+'.mp4', 'wb') as code:
    code.write(f.read())
  vidNum += 1

Everything seems to work fine, but when I try to play one of the videos, I get this error: "Python (v2.7) requires to install plugins to play media files of the following type: text/html decoder" In addition, if I download the video from the website manually, the file is approximately 22.8MB, but when I use my script, the file is only 7.8kB.

Am I doing something wrong with the way I'm downloading the file? Any help would be greatly appreciated.

Also: I'm operating on an Ubuntu 12.04 LTS operating system using Python v2.7.

****EDIT****

Here is the code I'm using based on responses I've received:

import requests

r = requests.get('https://class.coursera.org/sna-003/lecture/download.mp4?lecture_id=2', auth=('myUsername', 'myPassword'))

with open('Data Analysis/TestFile.mp4', 'wb') as fd:
  fd.write(r.content)

Here is the output of r.content:

<!DOCTYPE html>
<html itemtype="http://schema.org" xmlns:fb="http://ogp.me/ns/fb#"><head><meta content="IE=Edge,chrome=IE7" http-equiv="X-UA-Compatible"/><meta content="!" name="fragment"/><meta content="NOODP" name="robots"/><meta charset="utf-8"/><meta content="Coursera" property="og:title"/><meta content="website" property="og:type"/><meta content="http://s3.amazonaws.com/coursera/media/Coursera_Computer_Narrow.png" property="og:image"/><meta content="https://www.coursera.org/" property="og:url"/><meta content="Coursera" property="og:site_name"/><meta content="en_US" property="og:locale"/><meta content="Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits." property="og:description"/><meta content="727836538,4807654" property="fb:admins"/><meta content="274998519252278" property="fb:app_id"/><meta content="Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits." name="description"/><meta content="http://s3.amazonaws.com/coursera/media/Coursera_Computer_Narrow.png" name="image"/><meta content="app-id=736535961" name="apple-itunes-app"/><script>window.onerror = function(message, url, lineNum) {

  // First check the URL and line number of the error
  url = url || window.location.href;
  // 99% of the time, errors without line numbers arent due to our code,
  // they are due to third party plugins and browser extensions
  if (lineNum === undefined || lineNum == null) return;

  // Now figure out the actual error message
  // If it's an event, as triggered in several browsers
  if (message.target &amp;&amp; message.type) {
    message = message.type;
  }
  if (!message.indexOf) {
    message = 'Non-string, non-event error: ' + (typeof message);
  }

  var errorDescrip = {
    message: message,
    script: url,
    line: lineNum,
    url: document.URL
  }

  var err = {
    key: 'page.error.javascript', 
    value: errorDescrip
  }

  window._204 = window._204 || [];
  window._204.push(err);

  window._gaq = window._gaq || [];
  window._gaq.push(err);
}</script><title>Coursera.org</title><link href="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/css/home.css" rel="stylesheet" type="text/css"/><link href="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/pages/auth/css/auth.css" rel="stylesheet" type="text/css"/><script data-baseurl="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/" id="_mobile">(function(el) {
  // Override certian behaviour if the page is for our mobile app.
  // TODO(priya) Remove this conditional behaviour once I want to push this behaviour
  // for regular authentication pages on mobile/smaller screens as well.
  // Currently I'm keeping existing behaviour same and only adding mobile specific
  // layouts ot /mobilesignup page (which is what isMobileApp = true signifies).
  if ("false" == "true") {
    var head = document.getElementsByTagName('head')[0];
    // Add viewport meta tag
    var viewport = document.querySelector('meta[name=viewport]');
    var viewportContent = 'width=device-width, initial-scale=1.0, user-scalable=no';
    if (!viewport) {
        viewport = document.createElement('meta');
        viewport.setAttribute('name', 'viewport');
        head.appendChild(viewport);
    }
    viewport.setAttribute('content', viewportContent);

    // Add responsive css
    var link  = document.createElement('link');
    link.rel  = 'stylesheet';
    link.type = 'text/css';
    link.href = el.getAttribute("data-baseurl") + "pages/auth/css/auth_responsive.css";
    head.appendChild(link);
  }
})(document.getElementById("_mobile"));
</script></head><body><div id="fb-root"></div><div id="origami"><div style="position:absolute;top:0px;left:0px;width:100%;height:100%;background:#f5f5f5;padding-top:5%;"><div id="coursera-loading-nojs" style="text-align:center; margin-bottom:10px;display:none;">Please use a <a href="/browsers">modern browser </a> with JavaScript enabled to use Coursera.</div><div><span id="coursera-loading-js" style="display: none; padding-left:45%">loading   <img src="https://d2wvvaown1ul17.cloudfront.net/site-static/images/icons/loading.gif"/></span></div><noscript><div style="text-align:center; margin-bottom:10px;">Please use a <a href="/browsers">modern browser </a> with JavaScript enabled to use Coursera.</div></noscript></div></div><!--[if gte IE 8]&gt;&lt;script&gt;document.getElementById("coursera-loading-js").style.display = 'block';&lt;/script&gt;&lt;![endif]-->
<!--[if lte IE 7]&gt;&lt;script&gt;document.getElementById("coursera-loading-nojs").style.display = 'block';
window._204 = window._204 || [];
window._gaq = window._gaq || [];

window._gaq.push(
    ['_setAccount', 'UA-28377374-1'],
    ['_setDomainName', window.location.hostname],
    ['_setAllowLinker', true],
    ['_trackPageview', window.location.pathname]);

window._204.push(
  ['client', 'home'],
  {key:"pageview", value:window.location.pathname});
  &lt;/script&gt;&lt;script src="https://eventing.coursera.org/204.min.js"&gt;&lt;/script&gt;&lt;script src="https://ssl.google-analytics.com/ga.js"&gt;&lt;/script&gt;&lt;![endif]-->
<!--[if !IE]&gt; --><script>document.getElementById("coursera-loading-js").style.display = 'block';</script><!-- &lt;![endif]--><script src="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/js/core/require.js" type="text/javascript"></script><script data-baseurl="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/" data-debug="0" data-locale="" data-timestamp="1386838999742" data-version="e47434615f57601f9b9ccaf255a589e8550d328d" id="_require" type="text/javascript">if(document.getElementById("coursera-loading-js").style.display == 'block') {
  (function(el) {
     // prevent throw
     require.onError = function(err) {
       window._204 = window._204 || [];
       window._204.push({key: 'requireErr', value: err});
     };

     define("pages/auth/authConfig",
         function() {
             return {"coursera_url": "https://www.coursera.org/",
                     "environment": "production"};
     }
     );

     require.config({
       enforceDefine: false,
       waitSeconds: 14,
       baseUrl: el.getAttribute("data-baseurl"),
       urlArgs: el.getAttribute("data-debug") == "1" ? "v=" + el.getAttribute("data-timestamp") : "",
       shim: {
          "underscore": {
             exports: '_'
          },
          "backbone": {
             deps: ['underscore', 'jquery'],
             exports: 'Backbone'
          }
       },
       paths: {
          "jquery":       "js/core/jquery",
          "underscore":   "js/core/underscore",
          "backbone":     "js/core/backbone",
          "i18n":         "js/core/i18n._t"
       },
       callback: function() {
         require(["pages/auth/routes"]); // bootup coursera
       },
       config: {
         i18n: {
           locale: (window.localStorage ? localStorage.getItem("locale") : '') || el.getAttribute("data-locale")
         }
       }
     });
  })(document.getElementById("_require"));
}</script><script type="text/javascript">define("pages/home/models/user.json", [], function(){
  return null;
});
</script></body></html>

I find this weird, though, because it just looks like the source code of the website, but yet when I view r.url I get an actual website that I can load in my browser and it prompts me to save or view the video. Even when I try to pass the new url I get from that, which I assume contains my cookie information, I still get the same content back. I don't understand where I'm going wrong.

like image 640
tblznbits Avatar asked Feb 14 '23 07:02

tblznbits


2 Answers

First, download and install the requests package.

Then use this code:

import requests

def downloadfile(name,url):
    name=name+".mp4"
    r=requests.get('url')
    print "****Connected****"
    f=open(name,'wb');
    print "Donloading....."
    for chunk in r.iter_content(chunk_size=255): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    print "Done"
    f.close()
like image 174
jim Avatar answered Feb 16 '23 22:02

jim


You need to have a valid cookie, so that you don't download the login page.

Here is how you set cookies on urllib2

import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
f = opener.open("http://example.com/")

Also you could use cookielib to have a more web browser like behavior to make a login process and get the correct cookie to download your movie.

Another way would be using Requests which is something like urllib2, just way easier, to make an automated login process.

like image 29
Christian Schmitt Avatar answered Feb 16 '23 20:02

Christian Schmitt