I have a data.frame (say "df") looks like following:
Hospital.Name | State | Mortality.Rate
'hospital_1' | 'AA' | 0.2
'hospital_2' | 'AA' | 0.3
'hospital_3' | 'BB' | 0.3
'hospital_4' | 'CC' | 0.5
(The Hospital.Name is unique)
Now I want to order the "Mortality.Rate" group by "State", i.e. order the rate within a certain state. If there is a tie in the rate, then "Hospital.Name" is used for resolve the tie.
The "order()" and "tapply()" functions came to my mind. I coded like this:
tapply(df$Mortality.Rate, df$State, order, df$Hospital.Name, na.last=NA)
However, an error "argument length differ" popped up. When "order" function is applied to a sliced "Rate", the second argument of order (i.e. df$Hospital.Name) is not sliced.
How could I pass the second argument (for resolution a tie in ordering) to tapply() or is there any other approaches?
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
What is the arrange() function in R? The arrange() function in R programming is used to reorder the rows of a data frame/table by using column names. These columns are passed as the expression in the function.
If we already in loading needles (for this specific operation) packages, here's a package (data.table
) that could be useful in a sense of sorting the data by reference (without copying it and the need of using <-
) using the setorder
or setkey
functions
library(data.table)
setorder(setDT(df), State, Mortality.Rate, Hospital.Name)
Though, you could potentially mimic base R syntax and order the data while creating a copy (though with improved speed because data.table
calls its forder
under the hood)
setDT(df)[order(State, Mortality.Rate, Hospital.Name)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With