I have a list of txt files stored in A.path
that I would like to use grep
on to find the year associated with that file, and save this year to a vector. However, as some of these txt files have multiple years in their text, I would only like to store the first year. How can I do this?
I've done similar things using lapply
, and this is how I began approaching this problem:
lapply(A.path, function(i){
j <- paste0(scan(i, what = character(), comment.char='', quote=NULL), collapse = " ")
year <- vector()
year[i] <- grep('[0-9][0-9][0-9][0-9]', j)
})
grep
probably isn't the right function to use, as this returns the entirety of j
for each i
. What is the right function to use here?
The grep R function returns the indices of vector elements that contain the character “a” (i.e. the second and the fourth element). The grepl function, in contrast, returns a logical vector indicating whether a match was found (i.e. TRUE) or not (i.e. FALSE).
17.4 grepl() grepl() returns a logical vector indicating which element of a character vector contains the match. For example, suppose we want to know which states in the United States begin with word “New”. Here, we can see that grepl() returns a logical vector that can be used to subset the original state.name vector.
Description. grep , grepl , regexpr , gregexpr , regexec and gregexec search for matches to argument pattern within each element of a character vector: they differ in the format of and amount of detail in the results. sub and gsub perform replacement of the first and all matches respectively.
Converting comment to answer, you can use gsub
with \\1
to extract the value of the first match (ie. the text matched between ()
in the regex)
gsub(".*?([0-9]{4}).*", "\\1", j)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With