<div class="someClass"> <a href="href"> <img alt="some" src="some"/> </a> </div>
I want to extract the source (i.e. src) attribute from an image (i.e. img) tag using BeautifulSoup. I use bs4 and I cannot use a.attrs['src']
to get the src
, but I can get href
. What should I do?
To extract text that is directly under an element in Beautiful Soup use the find_all(text=True, recursive=False) method. Here, note the following: The text=True means to look for text instead of elements.
We can access a tag's attributes by treating it like a dictionary. Implementation: Example 1: Program to extract the attributes using attrs approach.
You can use BeautifulSoup
to extract src
attribute of an html img
tag. In my example, the htmlText
contains the img
tag itself but this can be used for a URL too along with urllib2
.
For URLs
from BeautifulSoup import BeautifulSoup as BSHTML import urllib2 page = urllib2.urlopen('http://www.youtube.com/') soup = BSHTML(page) images = soup.findAll('img') for image in images: #print image source print image['src'] #print alternate text print image['alt']
For Texts with img tag
from BeautifulSoup import BeautifulSoup as BSHTML htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """ soup = BSHTML(htmlText) images = soup.findAll('img') for image in images: print image['src']
Python 3 : Updated on 2022-02-02
from bs4 import BeautifulSoup as BSHTML import urllib page = urllib.request.urlopen('https://github.com/abushoeb/emotag') soup = BSHTML(page) images = soup.findAll('img') for image in images: #print image source print(image['src']) #print alternate text print(image['alt'])
Install modules if needed
# python 3 pip install beautifulsoup4 pip install urllib3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With