Here is the python code:
url = http://www.phonebook.com.pk/dynamic/search.aspx
path = urlparse(url)
print (path)
>>>ParseResult(scheme='http', netloc='www.phonebook.com.pk', path='/dynamic/search.aspx', params='', query='searchtype=cat&class_id=4520&page=1', fragment='')
print (path.path)
>>>/dynamic/search.aspx
Now I need to change the path.path
to my requirement. Like if "/dynamic/search.aspx" is the path then I only need the parts between the first slash and last slash including slashes which is "/dynamic/".
I have tried these two lines but end result is not what I expected that's why I am asking this question as my knowledge of "urllib.parse" is insufficient.
path = path.path[:path.path.index("/")]
print (path)
>>>Returns nothing.
path = path.path[path.path.index("/"):]
>>>/dynamic/search.aspx (as it was before, no change.)
In short whatever the path.path result is my need is directory names only. For example:" dynamic/search/search.aspx". now I need "dynamic/search/"
First, the desired part of the path
can be obtained using rfind
which returns the index of the last occurrence. The + 1
is for keeping the trailing slash.
desired_path = path.path[:path.path.rfind("/") + 1]
Second, use the _replace
method to replace the path
attribute of the urlparse
object as follows:
desired_url = urlunparse(path._replace(path=desired_path))
The full working example:
from urllib.parse import urlparse, urlunparse
url = "http://www.phonebook.com.pk/dynamic/search/search.aspx"
path = urlparse(url)
desired_path = path.path[:path.path.rfind("/") + 1]
desired_url = urlunparse(path._replace(path=desired_path))
I've tried to look into urlparse
to find any method that could help in your situation, but didn't find, may be overlooked, but anyway, at this level, you probably would have to make your own method or hack:
>>> path.path
'/dynamic/search.aspx'
>>> import re
>>> d = re.search(r'/.*/', path.path)
>>> d.group(0)
'/dynamic/'
This is just an example to you, you may also use built-in methods, like so:
>>> i = path.path.index('/', 1)
>>>
>>> path.path[:i+1]
'/dynamic/'
EDIT:
I didn't notice your last example, so here is another way:
>>> import os
>>> path = os.path.dirname(path.path) + os.sep
>>> path
'/dynamic/'
>>> path = os.path.dirname(s) + os.sep
>>> path
'dynamic/search/'
Or with re
:
>>> s
'dynamic/search/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d
<_sre.SRE_Match object; span=(0, 15), match='dynamic/search/'>
>>> d.group(0)
'dynamic/search/'
>>>
>>> s = '/dynamic/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d.group(0)
'/dynamic/'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With