Use lapply on a subset of list elements and return list of same length as original in R

Question

I want to apply a regex operation to a subset of list elements (which are character strings) using lapply and return a list of same length as the original. The list elements are long strings (derived from reading in long text files and collapsing paragraphs into a single string). The regex operation is valid only for the subset of list elements/strings. I want the non-subsetted list elements (character strings) to be returned in their original state.

The regex operation is str_extract from the stringr package, i.e. I want to extract a substring from a longer string. I subset the list elements based on a regex pattern in the filename.

An example with simplified data:

library(stringr)
texts <- as.list(c("abcdefghijkl", "mnopqrstuvwxyz", "ghijklmnopqrs", "uvwxyzabcdef"))
filenames <- c("AB1997R.txt", "BG2000S.txt", "MN1999R.txt", "DC1997S.txt")
names(texts) <- filenames
regexp <- "abcdef"

I know in advance to which strings I want to apply the regex operation, and hence I want to subset these strings. That is, I don't want to run the regex over all elements in the list, as doing so will return some invalid results (which is not apparent in this simplified example).

I've made a few naive efforts, e.g.:

x <- lapply(texts[str_detect(names(texts), "1997")], str_extract, regexp)
> x
$AB1997R.txt
[1] "abcdef"

$DC1997S.txt
[1] "abcdef"

which returns a reduced-length list containing just the substrings found. But the results I want to get are:

> x
$AB1997R.txt
[1] "abcdef"

$BG2000S.txt
[1] "mnopqrstuvwxyz"

$MN1999R.txt
[1] "ghijklmnopqrs"

$DC1997S.txt
[1] "abcdef"

where the strings not containing the regex pattern are returned in their original state.

I have informed myself about stringr, lapply and llply (in the plyr package), but many operations are illustrated using dataframes as examples, not lists, and don't involve regex operations on character strings. I can achieve my goal using a for loop, but I'm trying to get away from that, as is generally advised, and get better at using the apply-class of functions.

sgibb · Accepted Answer

You can use the subset operator [<-:

x <- texts
is1997 <- str_detect(names(texts), "1997")
x[is1997] <- lapply(texts[is1997], str_extract, regexp)
x
# $AB1997R.txt
# [1] "abcdef"
#
# $BG2000S.txt
# [1] "mnopqrstuvwxyz"
#
# $MN1999R.txt
# [1] "ghijklmnopqrs"
#
# $DC1997S.txt
# [1] "abcdef"
#

Use lapply on a subset of list elements and return list of same length as original in R

Tags:

regex

r

lapply

plyr

Brigitte

1 Answers

sgibb

Recent Activity

Donate For Us

Use lapply on a subset of list elements and return list of same length as original in R

Tags:

regex

r

lapply

plyr

Brigitte

1 Answers

sgibb

Related questions

Recent Activity

Donate For Us