Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binding columns with similar column names in the same dataframe in R

Tags:

r

I have a data frame that looks somewhat like this:

df <- data.frame(0:2, 1:3, 2:4, 5:7, 6:8, 2:4, 0:2, 1:3, 2:4)
colnames(df) <- rep(c('a', 'b', 'c'), 3)
> df
  a b c a b c a b c
1 0 1 2 5 6 2 0 1 2
2 1 2 3 6 7 3 1 2 3
3 2 3 4 7 8 4 2 3 4

There are multiple columns that have the same name. I would like to rearrange the data frame so that the columns with the same names combine into their own supercolumn, so that there are only unique column names left, for example:

> df
  a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 5 6 2
5 6 7 3
6 7 8 4
7 0 1 2
8 1 2 3
9 2 3 4

Any thoughts on how to do this? Thanks in advance!

like image 255
tkvn Avatar asked Apr 04 '13 05:04

tkvn


People also ask

How do I merge two Dataframes with the same column names in R?

To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.

Can you have columns with the same name in R?

Because, usually R dataframes do not allow exact same names (when you create them using data. frame() ). Which means dataframes should not have the same column names. In your example, when you do d3$a R can only show you the first column that have the name a .

How do I bind two data frames with different columns in R?

The bind_rows() method is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.

How do I bind two columns in R?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.


2 Answers

This will do the trick, I suppose.

Explanation

df[,names(df) == 'a'] will select all columns with name a

unlist will convert above columns into 1 single vector

unname will remove some stray rownames given to these vectors.

unique(names(df)) will give you unique column names in df

sapply will apply the inline function to all values of unique(names(df))

> df
  a b c a b c a b c
1 0 1 2 5 6 2 0 1 2
2 1 2 3 6 7 3 1 2 3
3 2 3 4 7 8 4 2 3 4
> sapply(unique(names(df)), function(x) unname(unlist(df[,names(df)==x])))
      a b c
 [1,] 0 1 2
 [2,] 1 2 3
 [3,] 2 3 4
 [4,] 5 6 2
 [5,] 6 7 3
 [6,] 7 8 4
 [7,] 0 1 2
 [8,] 1 2 3
 [9,] 2 3 4
like image 119
CHP Avatar answered Oct 22 '22 14:10

CHP


My version:

library(reshape)
as.data.frame(with(melt(df), split(value, variable)))
  a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 0 1 2
5 1 2 3
6 2 3 4
7 0 1 2
8 1 2 3
9 2 3 4

In the step using melt I transform the dataset:

> melt(df)
Using  as id variables
   variable value
1         a     0
2         a     1
3         a     2
4         b     1
5         b     2
6         b     3
7         c     2
8         c     3
9         c     4
10        a     0
11        a     1
12        a     2
13        b     1
14        b     2
15        b     3
16        c     2
17        c     3
18        c     4
19        a     0
20        a     1
21        a     2
22        b     1
23        b     2
24        b     3
25        c     2
26        c     3
27        c     4

Then I split up the value column for each unique level of variable using split:

$a
[1] 0 1 2 0 1 2 0 1 2

$b
[1] 1 2 3 1 2 3 1 2 3

$c
[1] 2 3 4 2 3 4 2 3 4

then this only needs an as.data.frame to become the data structure you need.

like image 37
Paul Hiemstra Avatar answered Oct 22 '22 15:10

Paul Hiemstra