Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace characters not working in python [duplicate]

Tags:

I am using beautiful soup and I am writing a crawler and have the following code in it:

  print soup.originalEncoding                 #self.addtoindex(page, soup)                   links=soup('a')             for link in links:                  if('href' in dict(link.attrs)):                                        link['href'].replace('..', '')                     url=urljoin(page, link['href'])                     if url.find("'") != -1:                         continue                     url = url.split('?')[0]                     url = url.split('#')[0]                     if url[0:4] == 'http':                         newpages.add(url)         pages = newpages 

The link['href'].replace('..', '') is supposed to fix links that come out as ../contact/orderform.aspx, ../contact/requestconsult.aspx, etc. However, it is not working. Links still have the leading ".." Is there something I am missing?

like image 567
sdiener Avatar asked Aug 26 '11 18:08

sdiener


People also ask

Why is my replace not working in Python?

1 Answer. You are facing this issue because you are using the replace method incorrectly. When you call the replace method on a string in python you get a new string with the contents replaced as specified in the method call. You are not storing the modified string but are just using the unmodified string.

How do you replace characters in Python?

The Python replace() method is used to find and replace characters in a string. It requires a substring to be passed as an argument; the function finds and replaces it. The replace() method is commonly used in data cleaning.


1 Answers

string.replace() returns the string with the replaced values. It doesn't modify the original so do something like this:

link['href'] = link['href'].replace("..", "") 
like image 114
joel goldstick Avatar answered Oct 06 '22 16:10

joel goldstick