Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a subset of elements in multiple data frames that are greater than elements in a different list?

Tags:

list

r

lapply

Sorry if I formatted this incorrectly or if the title isn't quite right, I am new to R and stack overflow. I am working with a list (called climates) that has 20 data frames (from each province) that each have year, month, day, and temperature columns (along with some other stuff). I am looking to find the rows where the temperature is above a certain threshold, but this threshold changes for each province. I've been able to use lapply to find the threshold for each province, but when I try to use those thresholds to find the rows in the data where the temp is above the threshold, the output isn't correct. My code does return a bunch of numbers, but they don't seem to be related to being greater than the threshold, and I also don't know how to get it to return the entire row instead of just the temperature value.

example climate list:

A <- data.frame("D" = c(1:30), "T" = c(sample(10:30, size = 30, replace = TRUE)))
B <- data.frame("D" = c(1:30), "T" = c(sample(4:22, size = 30, replace = TRUE)))
C <- data.frame("D" = c(1:30), "T" = c(sample(14:35, size = 30, replace = TRUE)))

climate <- list("Alist" = A, "Blist" = B, "Clist" = C)
climate

I've used lapply to find the threshold,

thresh95 <- lapply(lapply(
  climate, `[[`, 2), # this one takes my list of climate data and selects the T column for all provinces
  quantile, probs = c(0.95), na.rm = TRUE) # this one takes the previous list and finds 95th percentile value
thresh95

but when I try to then find the temperatures that are above the threshold, something goes wrong.

tmax95 <-  lapply(lapply(climate, `[[`, 2), # this one takes my list of climate data and selects the T column for all provinces
  function(x) x[which(x>thresh95)])# this one takes my list of climate data and selects the temps that are greater than the threshold
tmax95

Is there a way to write something that will return a subset of each province's data frame where the condition is that the temperature is greater than the threshold? Thanks!

like image 660
user26711711 Avatar asked Oct 28 '25 02:10

user26711711


2 Answers

What's Wrong?

Your thres95 is a list like

> thresh95
$Alist
95%
 29

$Blist
95%
 22

$Clist
95%
 34

but x is just a vector. So you have error if you apply x > thresh95

Workaround Option

You can run the code below (data borrowed from @Edward)

lapply(
  climate,
  function(x) {
    subset(
      x,
      T > quantile(T, probs = 0.95)
    )
  }
)

which gives

$Alist
    D  T
19 19 30

$Blist
[1] D T
<0 rows> (or 0-length row.names)

$Clist
    D  T
17 17 35
like image 87
ThomasIsCoding Avatar answered Oct 30 '25 16:10

ThomasIsCoding


You need mapply.

But first, always set the seed when simulating data.

set.seed(1234)

A <- data.frame("D" = c(1:30), "T" = c(sample(10:30, size = 30, replace = TRUE)))
B <- data.frame("D" = c(1:30), "T" = c(sample(4:22, size = 30, replace = TRUE)))
C <- data.frame("D" = c(1:30), "T" = c(sample(14:35, size = 30, replace = TRUE)))

mapply(\(x,y) x[which(x[,2] > y),], x=climate, y=thresh95, SIMPLIFY=FALSE)

$Alist
    D  T
19 19 30

$Blist
[1] D T
<0 rows> (or 0-length row.names)

$Clist
    D  T
17 17 35
like image 31
Edward Avatar answered Oct 30 '25 14:10

Edward