Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use the row.names attribute to order the rows of my dataframe in R?

I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:

 row.names   class   564028      1 275747      1 601137      0 922930      1 481988      1 ... 

The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.

Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.

Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.

The documentation implores me to:

use attr(x, "row.names") if you need to retrieve an integer-valued set of row names.

but this leaves me with nothing but NULL.

My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?

like image 726
tumultous_rooster Avatar asked Nov 30 '13 02:11

tumultous_rooster


People also ask

How do I select rows with certain names in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

What is the function to set row names for a DataFrame in R?

`. rowNamesDF<-` is a (non-generic replacement) function to set row names for data frames, with extra argument make. names .


1 Answers

None of the other solutions would actually work.

It should be:

# Assuming the data frame is called df df[ order(as.numeric(row.names(df))), ] 

because the row name in R is character, when the as.numeric part is missing it, it will arrange the data as 1, 10, 11, ... and so on.

like image 137
ToNoY Avatar answered Oct 23 '22 22:10

ToNoY