Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an R function to sequentially assign a code to each value in a dataframe, in the order it appears within the dataset?

I have a table with a long list of aliased values like this:

> head(transmission9, 50)
# A tibble: 50 x 2
   In_Node  End_Node
   <chr>    <chr>   
 1 c4ca4238 2838023a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a

I would like to have R go through both columns and assign a number sequentially to each value, in the order that an aliased value appears in the dataset. I would like R to read across rows first, then down columns. For example, for the dataset above:

   In_Node  End_Node
   <chr>    <chr>   
 1  1       2
 2  1       3
 3  1       4
 4  1       5
 5  6       1
 6  7       8

Is this possible? Ideally, I'd also love to be able to generate a "key" which would match each sequential code to each aliased value, like so:

Code Value
1    c4ca4238
2    2838023a
3    d82c8d16
4    a684ecee
5    fc490ca4

Thank you in advance for the help!

like image 949
gbg Avatar asked Jul 15 '21 15:07

gbg


People also ask

What is NROW in R?

nrow() function in R Language is used to return the number of rows of the specified matrix.

How do you assign data in R?

In the R Commander, you can click the Data set button to select a data set, and then click the Edit data set button. For more advanced data manipulation in R Commander, explore the Data menu, particularly the Data / Active data set and Data / Manage variables in active data set menus.

How do I make a list into a Dataframe in R?

In order to create a list of data frames in R, we can use the list() function. Within the function, we simply have to specify all data frames that we want to include to the list separated by a comma.

How do you find the number of rows and the number of columns by a single command in R?

The ncol() function in R programming R programming helps us with ncol() function by which we can get the information on the count of the columns of the object. That is, ncol() function returns the total number of columns present in the object.


2 Answers

You could do:

df1 <- df
df1[]<-as.numeric(factor(unlist(df), unique(c(t(df)))))
df1
  In_Node End_Node
1       1        2
2       1        3
3       1        4
4       1        5
5       6        1
6       7        8
like image 61
KU99 Avatar answered Sep 27 '22 22:09

KU99


You can match against the unique values. For a single vector, the code is straightforward:

match(vec, unique(vec))

The requirement to go across columns before rows makes this slightly tricky: you need to transpose the values first. After that, match them.

Finally, use [<- to assign the result back to a data.frame of the same shape as your original data (here x):

y = x
y[] = match(unlist(x), unique(c(t(x))))
y
  V2 V3
1  1  2
2  1  3
3  1  4
4  1  5
5  6  1
6  7  8

c(t(x)) is a bit of a hack:

  • t first converts the tibble to a matrix and then transposes it. If your tibble contains multiple data types, these will be coerced to a common type.
  • c(…) discards attributes. In particular, it drops the dimensions of the transposed matrix, i.e. it converts the matrix into a vector, with the values now in the correct order.
like image 26
Konrad Rudolph Avatar answered Sep 27 '22 21:09

Konrad Rudolph