Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-arrange multiple columns in a data set into one column using R

I would like to combine three columns in one of my data sets into one with variable name "al_anim" and remove any duplicates, rank the values (animal ids) from lowest to highest, and re-number each animal from 1 to N under the variable name "new_id".

 anim1 <- c(1456,2569,5489,1456,4587)
 anim2 <- c(6531,6987,6987,15487,6531)
 anim3 <- c(4587,6548,7894,3215,8542)
 mydf <- data.frame(anim1,anim2,anim3)

Any help would be very much appreciated!

Baz

like image 869
baz Avatar asked Sep 13 '11 08:09

baz


People also ask

How do I combine data from multiple columns into one column in R?

Convert multiple columns into a single column, To combine numerous data frame columns into one column, use the union() function from the tidyr package.

How do I reorder dataset columns in R?

Use select() function from dplyr package to reorder or change the order of columns in R, to use select() function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.

How do I merge multiple columns in R?

To join data frames on multiple columns in R use either base merge() function or use dplyr functions. Using the dplyr functions is the best approach as it runs faster than the R base approach. dplyr package provides several functions to join R data frames and all these supports joining on multiple columns.

How do I consolidate columns in R?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.


1 Answers

Using mydf from your example:

mydf <- data.frame(anim1, anim2, anim3)

Stack the data:

sdf <- stack(mydf)

Then compute the unique elements using unique()

uni <- unique(sdf[, "values"])

and then this will get them a new animal id

new_id <- as.numeric(as.factor(sort(uni)))

which would give:

> new_id
 [1]  1  2  3  4  5  6  7  8  9 10 11

However that is totally trivial; seq_along(uni) gets you there far more easily. So I wonder if you want

newdf <- data.frame(anim = sort(uni), new_id = seq_along(uni))
merge(sdf, newdf, by.x = "values", by.y = "anim")

which gives:

> merge(sdf, newdf, by.x = "values", by.y = "anim")
   values   ind new_id
1    1456 anim1      1
2    1456 anim1      1
3    2569 anim1      2
4    3215 anim3      3
5    4587 anim1      4
6    4587 anim3      4
7    5489 anim1      5
8    6531 anim2      6
9    6531 anim2      6
10   6548 anim3      7
11   6987 anim2      8
12   6987 anim2      8
13   7894 anim3      9
14   8542 anim3     10
15  15487 anim2     11

There is an amount of ambiguity in your Question which could be alleviated by giving an expected result/output.

like image 158
Gavin Simpson Avatar answered Sep 28 '22 08:09

Gavin Simpson