Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Sub String by using Python

I already extract some information from a forum. It is the raw string I have now:

string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don\'t have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff' 

The thing I do not like is the sub string "<font color="black"><font face="Times New Roman">" and "<font color="green"><font face="Arial">". I do want to keep the other part of string except this. So the result should be like this

resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff" 

How could I do this? Actually I used beautiful soup to extract the string above from a forum. Now I may prefer regular expression to remove the part.

like image 612
Wenhao.SHE Avatar asked Jan 02 '12 16:01

Wenhao.SHE


People also ask

How do you remove a sub string in Python?

String translate() will change the string by replacing the character or by deleting the character. We have to mention the Unicode for the character and None as a replacement to delete it from the String. Use the String translate() method to remove all occurrences of a substring from a string in python.

How do you remove a substring from a string?

To remove a substring from a string, call the replace() method, passing it the substring and an empty string as parameters, e.g. str. replace("example", "") . The replace() method will return a new string, where the first occurrence of the supplied substring is removed.

How do I remove a suffix in Python?

There are multiple ways to remove whitespace and other characters from a string in Python. The most commonly known methods are strip() , lstrip() , and rstrip() . Since Python version 3.9, two highly anticipated methods were introduced to remove the prefix or suffix of a string: removeprefix() and removesuffix() .


2 Answers

import re re.sub('<.*?>', '', string) "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff" 

The re.sub function takes a regular expresion and replace all the matches in the string with the second parameter. In this case, we are searching for all tags ('<.*?>') and replacing them with nothing ('').

The ? is used in re for non-greedy searches.

More about the re module.

like image 59
juliomalegria Avatar answered Sep 20 '22 01:09

juliomalegria


>>> import re >>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff" >>> re.sub("<.*?>","",st) " i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff" >>>  
like image 39
Abhijit Avatar answered Sep 18 '22 01:09

Abhijit