import re str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi" str2=re.match("[a-zA-Z]*//([a-zA-Z]*)",str) print str2.group() current result=> error expected => wwwqqqzzz I want to extract the string wwwqqqzzz. How I do that?
Maybe there are a lot of dots, such as:
"whatever..s#[email protected].:af//wwww.xxx.yn.zsdfsd.asfds.f.ds.fsd.whatever/123.dfiid" In this case, I basically want the stuff bounded by // and /. How do I achieve that?
One additional question:
import re str="xxx.yyy.xxx:80" m = re.search(r"([^:]*)", str) str2=m.group(0) print str2 str2=m.group(1) print str2 Seems that m.group(0) and m.group(1) are the same.
Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none.
In python programming we can check whether strings are equal or not using the “==” or by using the “. __eq__” function. Example: s1 = 'String' s2 = 'String' s3 = 'string' # case sensitive equals check if s1 == s2: print('s1 and s2 are equal.
match tries to match the entire string. Use search instead. The following pattern would then match your requirements:
m = re.search(r"//([^/]*)", str) print m.group(1) Basically, we are looking for /, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.
In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:
m = re.search(r"(?<=//)[^/]*", str) print m.group() Lookarounds are not included in the actual match, hence the desired result.
This (or any other reasonable regex solution) will not remove the .s immediately. But this can easily be done in a second step:
m = re.search(r"(?<=//)[^/]*", str) host = m.group() cleanedHost = host.replace(".", "") That does not even require regular expressions.
Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info into wwwregularexpressionsinfo) then you are better off using the regex version of replace:
cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With