<div class="someClass">     <a href="href">         <img alt="some" src="some"/>     </a> </div>  I want to extract the source (i.e. src) attribute from an image (i.e. img) tag using BeautifulSoup. I use bs4 and I cannot use a.attrs['src'] to get the src, but I can get href. What should I do?
To extract text that is directly under an element in Beautiful Soup use the find_all(text=True, recursive=False) method. Here, note the following: The text=True means to look for text instead of elements.
We can access a tag's attributes by treating it like a dictionary. Implementation: Example 1: Program to extract the attributes using attrs approach.
You can use BeautifulSoup to extract src attribute of an html img tag. In my example, the htmlText contains the img tag itself but this can be used for a URL too along with urllib2.
For URLs
from BeautifulSoup import BeautifulSoup as BSHTML import urllib2 page = urllib2.urlopen('http://www.youtube.com/') soup = BSHTML(page) images = soup.findAll('img') for image in images:     #print image source     print image['src']     #print alternate text     print image['alt']  For Texts with img tag
from BeautifulSoup import BeautifulSoup as BSHTML htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """ soup = BSHTML(htmlText) images = soup.findAll('img') for image in images:     print image['src']  Python 3 : Updated on 2022-02-02
from bs4 import BeautifulSoup as BSHTML import urllib  page = urllib.request.urlopen('https://github.com/abushoeb/emotag') soup = BSHTML(page) images = soup.findAll('img')  for image in images:     #print image source     print(image['src'])     #print alternate text     print(image['alt'])  Install modules if needed
# python 3 pip install beautifulsoup4 pip install urllib3 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With