So I want to subset my data frame to select rows with a daily maximum value.
Site Year Day Time Cover Size TempChange
ST1 2011 97 0.0 Closed small 0.97
ST1 2011 97 0.5 Closed small 1.02
ST1 2011 97 1.0 Closed small 1.10
Section of data frame is above. I would like to select only the rows which have the maximum value of the variable TempChange for each variable Day. I want to do this because I am interested in specific variables (not shown) for these particular times.
AMENDED EXAMPLE AND REQUIRED OUTPUT
Site Day Temp Row
a 10 0.2 1
a 10 0.3 2
a 11 0.5 3
a 11 0.4 4
b 10 0.1 5
b 10 0.8 6
b 11 0.7 7
b 11 0.6 8
c 10 0.2 9
c 10 0.3 10
c 11 0.5 11
c 11 0.8 12
REQUIRED OUTPUT
Site Day Temp Row
a 10 0.3 2
a 11 0.5 3
b 10 0.8 6
b 11 0.7 7
c 10 0.3 10
c 11 0.8 12
Hope that makes it clearer.
If we want to find the maximum of values two or more columns for each row in an R data frame then pmax function can be used.
Row wise maximum of the dataframe or maximum value of each row in R is calculated using rowMaxs() function. Other method to get the row maximum in R is by using apply() function. row wise maximum of the dataframe is also calculated using dplyr package.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
We can find the minimum and the maximum of a vector using the min() or the max() function. A function called range() is also available which returns the minimum and maximum in a two element vector.
After faffing with raw data frame code, I realised plyr could do this in one:
> df
Day V Z
1 97 0.26575207 1
2 97 0.09443351 2
3 97 0.88097858 3
4 98 0.62241515 4
5 98 0.61985937 5
6 99 0.06956219 6
7 100 0.86638108 7
8 100 0.08382254 8
> ddply(df,~Day,function(x){x[which.max(x$V),]})
Day V Z
1 97 0.88097858 3
2 98 0.62241515 4
3 99 0.06956219 6
4 100 0.86638108 7
To get the rows for max values for unique combinations of more than one column, just add the variable to the formula. For your modified example, its then:
> df
Site Day Temp Row
1 a 10 0.2 1
2 a 10 0.3 2
3 a 11 0.5 3
4 a 11 0.4 4
5 b 10 0.1 5
6 b 10 0.8 6
7 b 11 0.7 7
8 b 11 0.6 8
9 c 10 0.2 9
10 c 10 0.3 10
11 c 11 0.5 11
12 c 11 0.8 12
> ddply(df,~Day+Site,function(x){x[which.max(x$Temp),]})
Site Day Temp Row
1 a 10 0.3 2
2 b 10 0.8 6
3 c 10 0.3 10
4 a 11 0.5 3
5 b 11 0.7 7
6 c 11 0.8 12
Note this isn't in the same order as your original dataframe, but you can fix that.
> dmax = ddply(df,~Day+Site,function(x){x[which.max(x$Temp),]})
> dmax[order(dmax$Row),]
Site Day Temp Row
1 a 10 0.3 2
4 a 11 0.5 3
2 b 10 0.8 6
5 b 11 0.7 7
3 c 10 0.3 10
6 c 11 0.8 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With