I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:
row.names class 564028 1 275747 1 601137 0 922930 1 481988 1 ...
The row.names
attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.
Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names
attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.
Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.
The documentation implores me to:
use
attr(x, "row.names")
if you need to retrieve an integer-valued set of row names.
but this leaves me with nothing but NULL
.
My question is, how can I use row.names
which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
`. rowNamesDF<-` is a (non-generic replacement) function to set row names for data frames, with extra argument make. names .
None of the other solutions would actually work.
It should be:
# Assuming the data frame is called df df[ order(as.numeric(row.names(df))), ]
because the row name in R is character
, when the as.numeric
part is missing it, it will arrange the data as 1
, 10
, 11
, ... and so on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With