I have a call to find_all()
in my BeautifulSoup
code. This works currently to get me all images, but if I wanted to target only images which have a sub-string of "placeholder" in their src
, how could I do this?
for t in soup.find_all('img'): # WHERE img.href.contains("placeholder")
read() f. close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTags = soup. findAll(attrs={"name" : "stainfo"}) ### You may be able to do findAll("input", attrs={"name" : "stainfo"}) output = [x["stainfo"] for x in inputTags] print output ### This will print a list of the values.
find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.
Extract attribute from an element BeautifulSoup allows you to extract a single attribute from an element given its name just like how you would access a Python dictionary. For example, the following code snippet prints out the first author link from Quotes to Scrape page.
Beautifulsoup: Get the attribute value of an element 1. Find all by ul tag. 2. Iterate over the result. 3. Get the class value of each element. In the following example, well get the href attribute value. 3. Beautifulsoup: Find all by multiple attributes
1. Find all by ul tag. 2. Iterate over the result. 3. Get the class value of each element. In the following example, well get the href attribute value. 3. Beautifulsoup: Find all by multiple attributes attrs = { "attribute": "value", "attribute": "value", ... }
The easiest way to do this is with the new CSS style select method: What if I don't know the value beforehand, and just want to find tag containing the valign attribute? @MasayoMusic You can do soup.select ('td [valign]') to select all <td> having 'valign' attribute. You can also omit the tag name to select all elements with 'valign' attribute
Ideas? Actually, the h2 restriction is ignored according to the BeautifulSoup documentation: "If you use text, then any values you give for name and the keyword arguments are ignored."
You can pass a function in the src
keyword argument:
for t in soup.find_all('img', src=lambda x: x and 'placeholder' in x):
Or, a regular expression:
import re
for t in soup.find_all('img', src=re.compile(r'placeholder')):
Or, instead of find_all()
, use select()
:
for t in soup.select('img[src*=placeholder]'):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With