Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of .*\\?

Tags:

regex

r

I have been playing around with list.files() and I wanted to only list 001.csv through 010.csv and I came up with this command:

list_files <- list.files(directory, pattern = ".*\\000|010", full.names = TRUE)

This code gives me what I want, but I do not fully understand what is happening with the pattern argument. How does pattern = .*\\\000 work?

like image 227
Chris Avatar asked Jan 07 '15 21:01

Chris


1 Answers

\\0 is a backreference that inserts the whole regex match to that point. Compare the following to see what that can mean:

sub("he", "", "hehello")
## [1] "hello"
sub("he\\0", "", "hehello")
## [1] "llo"

With strings like "001.csv" or "009.csv", what happens is that the .* matches zero characters, the \\0 repeats those zero characters one time, and the 00 matches the first two zeros in the string. Success!

This pattern won't match "100.csv" or "010.csv" because it can't find anything to match that is doubled and then immediately followed by two 0s. It will, though, match "1100.csv", because it matches 1, then doubles it, and then finds two 0s.

So, to recap, ".*\\000" matches any string beginning with xx00 where x stands for any substring of zero or more characters. That is, it matches anything repeated twice and then folllowed by two zeros.

like image 200
Josh O'Brien Avatar answered Oct 10 '22 08:10

Josh O'Brien