Searching Large String for file path. Return filepath + filename

Question

I've got a little project where I’m trying to download a series of wallpapers from a web page. I'm new to python.

I'm using the urllib library, which is returning a long string of web page data which includes

<a href="http://website.com/wallpaper/filename.jpg">

I know that every filename I need to download has

'http://website.com/wallpaper/'

How can i search the page source for this portion of text, and return the rest of the image link, ending with "*.jpg" extension?

r'http://website.com/wallpaper/ xxxxxx .jpg'

I'm thinking if I could format a regular expression with the xxxx portion not being evaluated? Just check for the path, and the .jpg extension. Then return the whole string once a match is found

Am I on the right track?

Timothy Schmitz · Accepted Answer

BeautifulSoup is pretty convenient for this sort of thing.

import re
import urllib3
from bs4 import BeautifulSoup

jpg_regex = re.compile('\.jpg$')
site_regex = re.compile('website\.com\/wallpaper\/')

pool = urllib3.PoolManager()
request = pool.request('GET', 'http://your_website.com/')
soup = BeautifulSoup(request)

jpg_list = list(soup.find_all(name='a', attrs={'href':jpg_regex}))
site_list = list(soup.find_all(name='a', attrs={'href':site_regex}))

result_list = map(lambda a: a.get('href'), jpg_list and site_list)

Saif · Answer

I think a very basic regex will do.
Like:

(http:\/\/website\.com\/wallpaper\/[\w\d_-]*?\.jpg)

and if you use $1this will return the whole String .

And if you use

(http:\/\/website\.com\/wallpaper\/([\w\d_-]*?)\.jpg)

then $1 will give the whole string and $2 will give the file name only.

Note: escaping (\/) is language dependent so use what is supported by python.

Searching Large String for file path. Return filepath + filename

Tags:

python

string

regex

html-parsing

beautifulsoup

tkezy

2 Answers

Timothy Schmitz

Saif

Recent Activity

Donate For Us

Searching Large String for file path. Return filepath + filename

Tags:

python

string

regex

html-parsing

beautifulsoup

tkezy

2 Answers

Timothy Schmitz

Saif

Related questions

Recent Activity

Donate For Us