I am trying to delete sequences of rows from a data frame, the sequence begins with a known string, and ends with a known string, but the content and number of the intervening rows is unknown. I would like to iterate this over the entire data frame.
For example, if the data frame is as below, I would like to remove the rows from all instances of StringA to StringB (inclusive) but retain the rows which follow StringB up to the next recurrence of StringA; for the example below, that is, I would like to remove the rows containing StringA, unknownC, unknownD, unknownS, StringB, but then retain unknownK and unknownR, then continue deleting at StringA, unknownU, unknownP, StringB, but retain unknownT.
Column 1 Column 2
StringA 1
unknownC 9
unknownD 11
unknownS 5
StringB 7
unknownK 6
unknownR 1
StringA 76
unknownU 2
unknownP 41
StringB 3
unknownT 9
I tried df2 <- df[1:which(df[,1]=="StringA")-1,]
, which is not correct but am at a loss as what other approach to try. Thank you in advance for any guidance.
You can try something like this, by constructing the index to be removed using the Map
function:
indexToRemove <- unlist(Map(`:`, which(df$`Column 1` == "StringA"),
which(df$`Column 1` == "StringB")))
df[-indexToRemove, ]
Column 1 Column 2
6 unknownK 6
7 unknownR 1
12 unknownT 9
Data:
structure(list(`Column 1` = structure(c(1L, 3L, 4L, 8L, 2L, 5L,
7L, 1L, 10L, 6L, 2L, 9L), .Label = c("StringA", "StringB", "unknownC",
"unknownD", "unknownK", "unknownP", "unknownR", "unknownS", "unknownT",
"unknownU"), class = "factor"), `Column 2` = c(1L, 9L, 11L, 5L,
7L, 6L, 1L, 76L, 2L, 41L, 3L, 9L)), .Names = c("Column 1", "Column 2"
), class = "data.frame", row.names = c(NA, -12L))
You can use a for
loop. Although this will be slower than the vectorised solutions posted, it does have some merits in terms of being quite versatile to adapt to similar related problems, and being robust against unexpected input data.
Notes:
The code:
keep.line <- TRUE
out.df <- data.frame()
for (i in 1:NROW(my.df)) {
if (my.df[i,]$Column1 == "StringA") keep.line <- FALSE
if (keep.line) out.df <- rbind(out.df, my.df[i,])
if (my.df[i,]$Column1 == "StringB") keep.line <- TRUE
}
out.df
## Column1 Column2
## unknownK 0.3679608
## unknownR -0.8867749
## unknownT 1.6277386
Some data:
Column1 <-c(
"StringA" ,
"unknownC",
"unknownD",
"unknownS",
"StringB" ,
"unknownK",
"unknownR",
"StringA" ,
"unknownU",
"unknownP",
"StringB" ,
"unknownT")
my.df <- data.frame(Column1, Column2 = rnorm(12), stringsAsFactors = F)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With