Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup find only elements where an attribute contains a sub-string? Is this possible?

I have a call to find_all() in my BeautifulSoup code. This works currently to get me all images, but if I wanted to target only images which have a sub-string of "placeholder" in their src, how could I do this?

for t in soup.find_all('img'):  # WHERE img.href.contains("placeholder")
like image 711
Simon Kiely Avatar asked Jan 30 '15 17:01

Simon Kiely


People also ask

How do you get the attribute of a tag in BeautifulSoup?

read() f. close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTags = soup. findAll(attrs={"name" : "stainfo"}) ### You may be able to do findAll("input", attrs={"name" : "stainfo"}) output = [x["stainfo"] for x in inputTags] print output ### This will print a list of the values.

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

What is Attrs in BeautifulSoup?

Extract attribute from an element BeautifulSoup allows you to extract a single attribute from an element given its name just like how you would access a Python dictionary. For example, the following code snippet prints out the first author link from Quotes to Scrape page.

How to get the attribute value of an element using beautifulsoup?

Beautifulsoup: Get the attribute value of an element 1. Find all by ul tag. 2. Iterate over the result. 3. Get the class value of each element. In the following example, well get the href attribute value. 3. Beautifulsoup: Find all by multiple attributes

How to find all elements by multiple attributes in HTML?

1. Find all by ul tag. 2. Iterate over the result. 3. Get the class value of each element. In the following example, well get the href attribute value. 3. Beautifulsoup: Find all by multiple attributes attrs = { "attribute": "value", "attribute": "value", ... }

How to select all elements with 'valign' attribute?

The easiest way to do this is with the new CSS style select method: What if I don't know the value beforehand, and just want to find tag containing the valign attribute? @MasayoMusic You can do soup.select ('td [valign]') to select all <td> having 'valign' attribute. You can also omit the tag name to select all elements with 'valign' attribute

What is the H2 restriction in beautifulsoup?

Ideas? Actually, the h2 restriction is ignored according to the BeautifulSoup documentation: "If you use text, then any values you give for name and the keyword arguments are ignored."


1 Answers

You can pass a function in the src keyword argument:

for t in soup.find_all('img', src=lambda x: x and 'placeholder' in x):

Or, a regular expression:

import re

for t in soup.find_all('img', src=re.compile(r'placeholder')):

Or, instead of find_all(), use select():

for t in soup.select('img[src*=placeholder]'):
like image 71
alecxe Avatar answered Oct 04 '22 22:10

alecxe