Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract src attribute from img tag using BeautifulSoup

<div class="someClass">     <a href="href">         <img alt="some" src="some"/>     </a> </div> 

I want to extract the source (i.e. src) attribute from an image (i.e. img) tag using BeautifulSoup. I use bs4 and I cannot use a.attrs['src'] to get the src, but I can get href. What should I do?

like image 817
iDelusion Avatar asked May 15 '17 14:05

iDelusion


People also ask

How do you extract text from a tag in BeautifulSoup?

To extract text that is directly under an element in Beautiful Soup use the find_all(text=True, recursive=False) method. Here, note the following: The text=True means to look for text instead of elements.

Which method in BeautifulSoup is used for extracting the attributes from HTML?

We can access a tag's attributes by treating it like a dictionary. Implementation: Example 1: Program to extract the attributes using attrs approach.


1 Answers

You can use BeautifulSoup to extract src attribute of an html img tag. In my example, the htmlText contains the img tag itself but this can be used for a URL too along with urllib2.

For URLs

from BeautifulSoup import BeautifulSoup as BSHTML import urllib2 page = urllib2.urlopen('http://www.youtube.com/') soup = BSHTML(page) images = soup.findAll('img') for image in images:     #print image source     print image['src']     #print alternate text     print image['alt'] 

For Texts with img tag

from BeautifulSoup import BeautifulSoup as BSHTML htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """ soup = BSHTML(htmlText) images = soup.findAll('img') for image in images:     print image['src'] 

Python 3 : Updated on 2022-02-02

from bs4 import BeautifulSoup as BSHTML import urllib  page = urllib.request.urlopen('https://github.com/abushoeb/emotag') soup = BSHTML(page) images = soup.findAll('img')  for image in images:     #print image source     print(image['src'])     #print alternate text     print(image['alt']) 

Install modules if needed

# python 3 pip install beautifulsoup4 pip install urllib3 
like image 71
Abu Shoeb Avatar answered Sep 30 '22 02:09

Abu Shoeb