Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Getting all images from an html file

Can someone help me parse a html file to get the links for all the images in the file in python?

Preferably with out a 3rd party module...

Thanks!

like image 495
user377419 Avatar asked Nov 28 '10 03:11

user377419


People also ask

How do you scrape and download all images from a webpage with Python?

A simple code to perform the download: text, 'html. parser') image_tags = soup. find_all('img') urls = [img['src'] for img in image_tags] for url in urls: filename = re.search(r'/([\w_-]+[.]( jpg|gif|png))$', url) if not filename: print("Regular expression didn't match with the url: {}".

How do I extract an image from HTML?

Scraping images from a website is same as any other attribute from HTML: You need to define your CSS selector by clicking on the html elements or by manually typing the CSS class, element id or tag name. Then just select the extract type as ATTR and value as “src” as in the screenshot below.


1 Answers

You can use Beautiful Soup. I know you said without a 3rd party module. However, this is an ideal tool for parsing HTML.

import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("http://www.url.com"))
page.findAll('img')
like image 55
Russell Dias Avatar answered Nov 22 '22 00:11

Russell Dias