Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching Large String for file path. Return filepath + filename

I've got a little project where I’m trying to download a series of wallpapers from a web page. I'm new to python.

I'm using the urllib library, which is returning a long string of web page data which includes

<a href="http://website.com/wallpaper/filename.jpg">

I know that every filename I need to download has

'http://website.com/wallpaper/'  

How can i search the page source for this portion of text, and return the rest of the image link, ending with "*.jpg" extension?

r'http://website.com/wallpaper/ xxxxxx .jpg'

I'm thinking if I could format a regular expression with the xxxx portion not being evaluated? Just check for the path, and the .jpg extension. Then return the whole string once a match is found

Am I on the right track?

like image 243
tkezy Avatar asked Apr 27 '26 07:04

tkezy


2 Answers

BeautifulSoup is pretty convenient for this sort of thing.

import re
import urllib3
from bs4 import BeautifulSoup

jpg_regex = re.compile('\.jpg$')
site_regex = re.compile('website\.com\/wallpaper\/')

pool = urllib3.PoolManager()
request = pool.request('GET', 'http://your_website.com/')
soup = BeautifulSoup(request)

jpg_list = list(soup.find_all(name='a', attrs={'href':jpg_regex}))
site_list = list(soup.find_all(name='a', attrs={'href':site_regex}))

result_list = map(lambda a: a.get('href'), jpg_list and site_list)
like image 155
Timothy Schmitz Avatar answered Apr 28 '26 20:04

Timothy Schmitz


I think a very basic regex will do.
Like:

(http:\/\/website\.com\/wallpaper\/[\w\d_-]*?\.jpg)

and if you use $1this will return the whole String .

And if you use

(http:\/\/website\.com\/wallpaper\/([\w\d_-]*?)\.jpg)

then $1 will give the whole string and $2 will give the file name only.

Note: escaping (\/) is language dependent so use what is supported by python.

like image 25
Saif Avatar answered Apr 28 '26 21:04

Saif



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!