When using web.whatsapp.de one can see that the link to a recieved image may look like this:
blob:https://web.whatsapp.com/3565e574-b363-4aca-85cd-2d84aa715c39
If the link is copied in to an address window it will open up the image, however - if "blob" is left out - it will simply open a new web whatsapp window.
I am trying to download the image displayed by this link.
But using common techniques such as using request, or urllib.request or even BeautifulSoup always struggle at one point: The "blob" at the beginning of the url will throw an error.
These answers Download file from Blob URL with Python will trhow either the Error
URLError: <urlopen error unknown url type: blob>
or the Error
InvalidSchema: No connection adapters were found for 'blob:https://web.whatsapp.com/f50eac63-6a7f-48a4-a2b8-8558a9ffe015'
(using BeatufilSoup)
Using a native approach like:
import requests
url = 'https://web.whatsapp.com/f50eac63-6a7f-48a4-a2b8-8558a9ffe015'
fileName = 'test.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
file.write(chunk)
file.close()
Will simply result in the same error as using BeautifulSoup.
I am controlling Chrome using Selenium in Python, however I was unable to download the image correctly using the provided link.
We can download images with Selenium webdriver in Python. First of all, we shall identify the image that we want to download with the help of the locators like id, class, xpath, and so on. We shall use the open method for opening the file in write and binary mode (is represented by wb).
A blob is a filelike object of raw data stored by the browser.
You can see them at chrome://blob-internals/
It's possible to get the content of a blob with Selenium with a script injection. However, you'll have to comply to the cross origin policy by running the script on the page/domain that created the blob:
def get_file_content_chrome(driver, uri):
result = driver.execute_async_script("""
var uri = arguments[0];
var callback = arguments[1];
var toBase64 = function(buffer){for(var r,n=new Uint8Array(buffer),t=n.length,a=new Uint8Array(4*Math.ceil(t/3)),i=new Uint8Array(64),o=0,c=0;64>c;++c)i[c]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".charCodeAt(c);for(c=0;t-t%3>c;c+=3,o+=4)r=n[c]<<16|n[c+1]<<8|n[c+2],a[o]=i[r>>18],a[o+1]=i[r>>12&63],a[o+2]=i[r>>6&63],a[o+3]=i[63&r];return t%3===1?(r=n[t-1],a[o]=i[r>>2],a[o+1]=i[r<<4&63],a[o+2]=61,a[o+3]=61):t%3===2&&(r=(n[t-2]<<8)+n[t-1],a[o]=i[r>>10],a[o+1]=i[r>>4&63],a[o+2]=i[r<<2&63],a[o+3]=61),new TextDecoder("ascii").decode(a)};
var xhr = new XMLHttpRequest();
xhr.responseType = 'arraybuffer';
xhr.onload = function(){ callback(toBase64(xhr.response)) };
xhr.onerror = function(){ callback(xhr.status) };
xhr.open('GET', uri);
xhr.send();
""", uri)
if type(result) == int :
raise Exception("Request failed with status %s" % result)
return base64.b64decode(result)
bytes = get_file_content_chrome(driver, "blob:https://developer.mozilla.org/7f9557f4-d8c8-4353-9752-5a49e85058f5")
For people who are trying to do the same in node and selenium, please refer below.
var script = function (blobUrl) {
console.log(arguments);
var uri = arguments[0];
var callback = arguments[arguments.length - 1];
var toBase64 = function(buffer) {
for(var r,n=new Uint8Array(buffer),t=n.length,a=new Uint8Array(4*Math.ceil(t/3)),i=new Uint8Array(64),o=0,c=0;64>c;++c)
i[c]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".charCodeAt(c);for(c=0;t-t%3>c;c+=3,o+=4)r=n[c]<<16|n[c+1]<<8|n[c+2],a[o]=i[r>>18],a[o+1]=i[r>>12&63],a[o+2]=i[r>>6&63],a[o+3]=i[63&r];return t%3===1?(r=n[t-1],a[o]=i[r>>2],a[o+1]=i[r<<4&63],a[o+2]=61,a[o+3]=61):t%3===2&&(r=(n[t-2]<<8)+n[t-1],a[o]=i[r>>10],a[o+1]=i[r>>4&63],a[o+2]=i[r<<2&63],a[o+3]=61),new TextDecoder("ascii").decode(a)
};
var xhr = new XMLHttpRequest();
xhr.responseType = 'arraybuffer';
xhr.onload = function(){ callback(toBase64(xhr.response)) };
xhr.onerror = function(){ callback(xhr.status) };
xhr.open('GET', uri);
xhr.send();
}
driver.executeAsyncScript(script, imgEleSrc).then((result) => {
console.log(result);
})
For detailed explanation, please refer below link https://medium.com/@anoop.goudar/how-to-get-data-from-blob-url-to-node-js-server-using-selenium-88b1ad57e36d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With