Strip random characters from url

Question

I have a list of urls as follows:

urls = [
www.example.com?search?q=Term&page=0,
www.example.com?search?q=Term&page=1,
www.example.com?search?q=Term&page=2
]

Where Term might be whatever term we want: Europe, London, etc..

My part of code (among the whole code) is the following:

for url in urls:
  file_name = url.replace('http://www.example.com/search?q=','').replace('=','').replace('&','')
  file_name = file_name+('.html')

which results in:

Termpage0.html
Termpage1.html
and so on..

How can I strip the Term in the list of urls to result as:

page0.html
page1.html
and so on?

niemmi · Accepted Answer

You could use urllib.parse to parse the URL and then the query part. Benefit of this approach is that it will work the same if order of query parts are changed or new parts are added:

from urllib import parse

urls = [
    'www.example.com?search?q=Term&page=0',
    'www.example.com?search?q=Term&page=1',
    'www.example.com?search?q=Term&page=2'
]

for url in urls:
    parts = parse.urlparse(url)
    query = parse.parse_qs(parts.query)
    print('page{}.html'.format(query['page'][0]))

Output:

page0.html
page1.html
page2.html

In above urlparse returns ParseResult object that contains URL components:

>>> from urllib import parse
>>> parts = parse.urlparse('www.example.com/search?q=Term&page=0')
>>> parts
ParseResult(scheme='', netloc='', path='www.example.com/search', params='', query='q=Term&page=0', fragment='')

Then parse_qs will return dict of query parameters where values are lists:

>>> query = parse.parse_qs(parts.query)
>>> query
{'page': ['0'], 'q': ['Term']}

Strip random characters from url

Tags:

python

replace

strip

Yannis Dran

1 Answers

niemmi

Recent Activity

Donate For Us

Strip random characters from url

Tags:

python

replace

strip

Yannis Dran

1 Answers

niemmi

Related questions

Recent Activity

Donate For Us