I have the following df
dat <- data.frame(Cases = c("Student3","Student3","Student3","Student1","Student1",
"Student2","Student2","Student2","Student4"), Class = rep("Math", 9),
Scores = c(9,5,2,7,3,8,5,1,7), stringsAsFactors = F)
> dat
Cases Class Scores
1 Student3 Math 9
2 Student3 Math 5
3 Student3 Math 2
4 Student1 Math 7
5 Student1 Math 3
6 Student2 Math 8
7 Student2 Math 5
8 Student2 Math 1
9 Student4 Math 7
On the other hand, I have another df with the following information:
d <- data.frame(Cases = c("Student3", "Student1",
"Student2", "Student4"), Class = rep("Math", 4), stringsAsFactors = F)
Cases Class
1 Student3 Math
2 Student1 Math
3 Student2 Math
4 Student4 Math
With these two, I want to extract the highest scores
for each student
. So my output would look like this:
> dat_output
Cases Class Scores
1 Student3 Math 9
2 Student1 Math 7
3 Student2 Math 8
4 Student4 Math 7
I tried with merge
but it is not extracting just the highest scores
.
We can use sapply
on each Cases
in d
, subset the dat
for that Cases
and get the max
score for it.
sapply(d$Cases, function(x) max(dat$Scores[dat$Cases %in% x]))
#Student3 Student1 Student2 Student4
# 9 7 8 7
To get the result as data.frame
transform(d, Scores = sapply(d$Cases, function(x)
max(dat$Scores[dat$Cases %in% x])))
# Cases Class Scores
# Student3 Math 9
# Student1 Math 7
# Student2 Math 8
# Student4 Math 7
Note - I have assumed your d
to be
d <- data.frame(Cases = c("Student3", "Student1",
"Student2", "Student4"), Class = rep("Math", 4), stringsAsFactors = F)
If I am correct you don't need d
, since in d
there is no additional information that is not in dat
already.
You can just do:
dat_output <- aggregate(Scores ~ Cases, dat, max)
dat_output
Cases Scores
1 Student1 7
2 Student2 8
3 Student3 9
4 Student4 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With