I have the following df
dat <- data.frame(Cases = c("Student3","Student3","Student3","Student1","Student1",
"Student2","Student2","Student2","Student4"), Class = rep("Math", 9),
Scores = c(9,5,2,7,3,8,5,1,7), stringsAsFactors = F)
> dat
Cases Class Scores
1 Student3 Math 9
2 Student3 Math 5
3 Student3 Math 2
4 Student1 Math 7
5 Student1 Math 3
6 Student2 Math 8
7 Student2 Math 5
8 Student2 Math 1
9 Student4 Math 7
On the other hand, I have another df with the following information:
d <- data.frame(Cases = c("Student3", "Student1",
"Student2", "Student4"), Class = rep("Math", 4), stringsAsFactors = F)
Cases Class
1 Student3 Math
2 Student1 Math
3 Student2 Math
4 Student4 Math
With these two, I want to extract the highest scores for each student. So my output would look like this:
> dat_output
Cases Class Scores
1 Student3 Math 9
2 Student1 Math 7
3 Student2 Math 8
4 Student4 Math 7
I tried with merge but it is not extracting just the highest scores.
We can use sapply on each Cases in d, subset the dat for that Cases and get the max score for it.
sapply(d$Cases, function(x) max(dat$Scores[dat$Cases %in% x]))
#Student3 Student1 Student2 Student4
# 9 7 8 7
To get the result as data.frame
transform(d, Scores = sapply(d$Cases, function(x)
max(dat$Scores[dat$Cases %in% x])))
# Cases Class Scores
# Student3 Math 9
# Student1 Math 7
# Student2 Math 8
# Student4 Math 7
Note - I have assumed your d to be
d <- data.frame(Cases = c("Student3", "Student1",
"Student2", "Student4"), Class = rep("Math", 4), stringsAsFactors = F)
If I am correct you don't need d, since in d there is no additional information that is not in dat already.
You can just do:
dat_output <- aggregate(Scores ~ Cases, dat, max)
dat_output
Cases Scores
1 Student1 7
2 Student2 8
3 Student3 9
4 Student4 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With