Both are supposed to the best of my knowledge to be the same but I actually see a difference, look at this minimal example from this question:
a<-c("/Cajon_Criolla_20141024","/Linon_20141115_20141130",
"/Cat/LIQUID",
"/c_puertas_20141206_20141107",
"/C_Puertas_3_20141017_20141018",
"/c_puertas_navidad_20141204_20141205")
sub("(.*?)_([0-9]{8})(.*)$","\\2",a)
[1] "20141024" "20141130" "/Cat/LIQUID" "20141107" "20141018"
[6] "20141205"
sub("(.*?)_([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])(.*)$","\\2",a)
[1] "20141024" "20141115" "/Cat/LIQUID" "20141206" "20141017"
[6] "20141204"
What am I missing? Or is this a bug?
This is a bug in the TRE library related to greedy modifiers and capture groups. See:
Setting perl=TRUE
gives the same answer (as expected) for both expressions:
> sub("(.*?)_([0-9]{8})(.*)$","\\2",a,perl=TRUE)
[1] "20141024" "20141115" "/Cat/LIQUID" "20141206" "20141017" "20141204"
> sub("(.*?)_([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])(.*)$","\\2",a,perl=TRUE)
[1] "20141024" "20141115" "/Cat/LIQUID" "20141206" "20141017" "20141204"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With