import re str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi" str2=re.match("[a-zA-Z]*//([a-zA-Z]*)",str) print str2.group() current result=> error expected => wwwqqqzzz
I want to extract the string wwwqqqzzz
. How I do that?
Maybe there are a lot of dots, such as:
"whatever..s#[email protected].:af//wwww.xxx.yn.zsdfsd.asfds.f.ds.fsd.whatever/123.dfiid"
In this case, I basically want the stuff bounded by //
and /
. How do I achieve that?
One additional question:
import re str="xxx.yyy.xxx:80" m = re.search(r"([^:]*)", str) str2=m.group(0) print str2 str2=m.group(1) print str2
Seems that m.group(0)
and m.group(1)
are the same.
Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none.
In python programming we can check whether strings are equal or not using the “==” or by using the “. __eq__” function. Example: s1 = 'String' s2 = 'String' s3 = 'string' # case sensitive equals check if s1 == s2: print('s1 and s2 are equal.
match
tries to match the entire string. Use search
instead. The following pattern would then match your requirements:
m = re.search(r"//([^/]*)", str) print m.group(1)
Basically, we are looking for /
, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.
In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:
m = re.search(r"(?<=//)[^/]*", str) print m.group()
Lookarounds are not included in the actual match, hence the desired result.
This (or any other reasonable regex solution) will not remove the .
s immediately. But this can easily be done in a second step:
m = re.search(r"(?<=//)[^/]*", str) host = m.group() cleanedHost = host.replace(".", "")
That does not even require regular expressions.
Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info
into wwwregularexpressionsinfo
) then you are better off using the regex version of replace
:
cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With