Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download an image with Python 3/Selenium if the URL begins with "blob:"?

When using web.whatsapp.de one can see that the link to a recieved image may look like this:

blob:https://web.whatsapp.com/3565e574-b363-4aca-85cd-2d84aa715c39

If the link is copied in to an address window it will open up the image, however - if "blob" is left out - it will simply open a new web whatsapp window.

I am trying to download the image displayed by this link.

But using common techniques such as using request, or urllib.request or even BeautifulSoup always struggle at one point: The "blob" at the beginning of the url will throw an error.

These answers Download file from Blob URL with Python will trhow either the Error

URLError: <urlopen error unknown url type: blob>

or the Error

InvalidSchema: No connection adapters were found for 'blob:https://web.whatsapp.com/f50eac63-6a7f-48a4-a2b8-8558a9ffe015'

(using BeatufilSoup)

Using a native approach like:

import requests

url = 'https://web.whatsapp.com/f50eac63-6a7f-48a4-a2b8-8558a9ffe015'
fileName = 'test.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
    file.write(chunk)
file.close()

Will simply result in the same error as using BeautifulSoup.

I am controlling Chrome using Selenium in Python, however I was unable to download the image correctly using the provided link.

like image 677
Kev1n91 Avatar asked Nov 21 '17 23:11

Kev1n91


People also ask

How do I get an image from a website using Selenium?

We can download images with Selenium webdriver in Python. First of all, we shall identify the image that we want to download with the help of the locators like id, class, xpath, and so on. We shall use the open method for opening the file in write and binary mode (is represented by wb).


2 Answers

A blob is a filelike object of raw data stored by the browser.

You can see them at chrome://blob-internals/

It's possible to get the content of a blob with Selenium with a script injection. However, you'll have to comply to the cross origin policy by running the script on the page/domain that created the blob:

def get_file_content_chrome(driver, uri):
  result = driver.execute_async_script("""
    var uri = arguments[0];
    var callback = arguments[1];
    var toBase64 = function(buffer){for(var r,n=new Uint8Array(buffer),t=n.length,a=new Uint8Array(4*Math.ceil(t/3)),i=new Uint8Array(64),o=0,c=0;64>c;++c)i[c]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".charCodeAt(c);for(c=0;t-t%3>c;c+=3,o+=4)r=n[c]<<16|n[c+1]<<8|n[c+2],a[o]=i[r>>18],a[o+1]=i[r>>12&63],a[o+2]=i[r>>6&63],a[o+3]=i[63&r];return t%3===1?(r=n[t-1],a[o]=i[r>>2],a[o+1]=i[r<<4&63],a[o+2]=61,a[o+3]=61):t%3===2&&(r=(n[t-2]<<8)+n[t-1],a[o]=i[r>>10],a[o+1]=i[r>>4&63],a[o+2]=i[r<<2&63],a[o+3]=61),new TextDecoder("ascii").decode(a)};
    var xhr = new XMLHttpRequest();
    xhr.responseType = 'arraybuffer';
    xhr.onload = function(){ callback(toBase64(xhr.response)) };
    xhr.onerror = function(){ callback(xhr.status) };
    xhr.open('GET', uri);
    xhr.send();
    """, uri)
  if type(result) == int :
    raise Exception("Request failed with status %s" % result)
  return base64.b64decode(result)

bytes = get_file_content_chrome(driver, "blob:https://developer.mozilla.org/7f9557f4-d8c8-4353-9752-5a49e85058f5")
like image 193
Florent B. Avatar answered Sep 17 '22 21:09

Florent B.


For people who are trying to do the same in node and selenium, please refer below.

var script = function (blobUrl) {
    console.log(arguments);
    var uri = arguments[0];
    var callback = arguments[arguments.length - 1];
    var toBase64 = function(buffer) {
        for(var r,n=new Uint8Array(buffer),t=n.length,a=new Uint8Array(4*Math.ceil(t/3)),i=new Uint8Array(64),o=0,c=0;64>c;++c)
            i[c]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".charCodeAt(c);for(c=0;t-t%3>c;c+=3,o+=4)r=n[c]<<16|n[c+1]<<8|n[c+2],a[o]=i[r>>18],a[o+1]=i[r>>12&63],a[o+2]=i[r>>6&63],a[o+3]=i[63&r];return t%3===1?(r=n[t-1],a[o]=i[r>>2],a[o+1]=i[r<<4&63],a[o+2]=61,a[o+3]=61):t%3===2&&(r=(n[t-2]<<8)+n[t-1],a[o]=i[r>>10],a[o+1]=i[r>>4&63],a[o+2]=i[r<<2&63],a[o+3]=61),new TextDecoder("ascii").decode(a)
    };
    var xhr = new XMLHttpRequest();
    xhr.responseType = 'arraybuffer';
    xhr.onload = function(){ callback(toBase64(xhr.response)) };
    xhr.onerror = function(){ callback(xhr.status) };
    xhr.open('GET', uri);
    xhr.send();
}
driver.executeAsyncScript(script, imgEleSrc).then((result) => {
    console.log(result);
})

For detailed explanation, please refer below link https://medium.com/@anoop.goudar/how-to-get-data-from-blob-url-to-node-js-server-using-selenium-88b1ad57e36d

like image 29
AnoopGoudar Avatar answered Sep 21 '22 21:09

AnoopGoudar