Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep a variable and store the result in a vector in R

Tags:

regex

r

I have a list of txt files stored in A.path that I would like to use grep on to find the year associated with that file, and save this year to a vector. However, as some of these txt files have multiple years in their text, I would only like to store the first year. How can I do this?

I've done similar things using lapply, and this is how I began approaching this problem:

lapply(A.path, function(i){
j <- paste0(scan(i, what = character(), comment.char='', quote=NULL),  collapse = " ")
year <- vector()
year[i] <- grep('[0-9][0-9][0-9][0-9]', j)
})

grep probably isn't the right function to use, as this returns the entirety of j for each i. What is the right function to use here?

like image 275
mlinegar Avatar asked Jul 27 '15 00:07

mlinegar


People also ask

What does grep return in R?

The grep R function returns the indices of vector elements that contain the character “a” (i.e. the second and the fourth element). The grepl function, in contrast, returns a logical vector indicating whether a match was found (i.e. TRUE) or not (i.e. FALSE).

What is the use of grep () Grepl () substr ()?

17.4 grepl() grepl() returns a logical vector indicating which element of a character vector contains the match. For example, suppose we want to know which states in the United States begin with word “New”. Here, we can see that grepl() returns a logical vector that can be used to subset the original state.name vector.

How are Regexpr Gregexpr and Regexec different than grep Grepl?

Description. grep , grepl , regexpr , gregexpr , regexec and gregexec search for matches to argument pattern within each element of a character vector: they differ in the format of and amount of detail in the results. sub and gsub perform replacement of the first and all matches respectively.


1 Answers

Converting comment to answer, you can use gsub with \\1 to extract the value of the first match (ie. the text matched between () in the regex)

gsub(".*?([0-9]{4}).*", "\\1", j)
like image 99
Rorschach Avatar answered Oct 31 '22 14:10

Rorschach