Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re finding string between underscore and ext

Tags:

python

regex

I have the following string

"1206292WS_R0_ws.shp"

I am trying to re.sub everything except what is between the second "_" and ".shp"

Output would be "ws" in this case.

I have managed to remove the .shp but for the life of me cannot figure out how to get rid of everything before the "_"

epass = "1206292WS_R0_ws.shp"

regex = re.compile(r"(\.shp$)")

x = re.sub(regex, "", epass)

Outputs

1206292WS_R0_ws

Desired output:

ws
like image 603
Tristan Forward Avatar asked Dec 26 '22 02:12

Tristan Forward


1 Answers

you dont really need a regex for this

print epass.split("_")[-1].split(".")[0]


>>> timeit.timeit("epass.split(\"_\")[-1].split(\".\")[0]",setup="from __main__
import epass")
0.57268652953933608

>>> timeit.timeit("regex.findall(epass)",setup="from __main__ import epass,regex
0.59134766185007948

speed seems very similar for both but a tiny bit faster with splits

actually by far the fastest method is

print epass.rsplit("_",1)[-1].split(".")[0]

which takes 3 seconds on a string 100k long (on my system) vs 35+ seconds for either of the other methods

If you actually mean the second _ and not the last _ then you could do it

epass.split("_",2)[-1].split(".")  

although depending on where the 2nd _ is a regex may be just as fast or faster

like image 159
Joran Beasley Avatar answered Jan 10 '23 00:01

Joran Beasley