Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse data-uri in python?

HTML image elements have this simplified format:

<img src='something'>

That something can be data-uri, for example:

...

Is there a standard way of parsing this with python, so that I get content_type and base64 data separated, or should I create my own parser for this?

like image 993
blueFast Avatar asked Nov 23 '15 11:11

blueFast


3 Answers

Split the data URI on the comma to get the base64 encoded data without the header. Call base64.b64decode to decode that to bytes. Last, write the bytes to a file.

from base64 import b64decode

data_uri = "..."

# Python 2 and <Python 3.4
header, encoded = data_uri.split(",", 1)
data = b64decode(encoded)

# Python 3.4+
# from urllib import request
# with request.urlopen(data_uri) as response:
#     data = response.read()

with open("image.png", "wb") as f:
    f.write(data)
like image 103
JRodDynamite Avatar answered Oct 13 '22 23:10

JRodDynamite


Python since 3.4 has support for data-uri, under the hood using urllib.request.DataHandler.

from urllib.request import urlopen

with urlopen(data_uri) as response:
    data = response.read()
like image 40
bl79 Avatar answered Oct 14 '22 01:10

bl79


w3lib (a library used by Scrapy) has a function to parse data uris:

>>> from w3lib.url import parse_data_uri
>>> parse_data_uri('')
ParseDataURIResult(media_type='image/png', media_type_parameters={}, data=b'\x89PNG\r\n\x1a')
like image 11
Mikhail Korobov Avatar answered Oct 14 '22 00:10

Mikhail Korobov