I have been playing around with list.files()
and I wanted to only list 001.csv
through 010.csv
and I came up with this command:
list_files <- list.files(directory, pattern = ".*\\000|010", full.names = TRUE)
This code gives me what I want, but I do not fully understand what is happening with the pattern argument. How does pattern = .*\\\000
work?
\\0
is a backreference that inserts the whole regex match to that point. Compare the following to see what that can mean:
sub("he", "", "hehello")
## [1] "hello"
sub("he\\0", "", "hehello")
## [1] "llo"
With strings like "001.csv"
or "009.csv"
, what happens is that the .*
matches zero characters, the \\0
repeats those zero characters one time, and the 00
matches the first two zeros in the string. Success!
This pattern won't match "100.csv"
or "010.csv"
because it can't find anything to match that is doubled and then immediately followed by two 0
s. It will, though, match "1100.csv"
, because it matches 1
, then doubles it, and then finds two 0
s.
So, to recap, ".*\\000"
matches any string beginning with xx00
where x
stands for any substring of zero or more characters. That is, it matches anything repeated twice and then folllowed by two zeros.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With