I have a data frame that looks somewhat like this:
df <- data.frame(0:2, 1:3, 2:4, 5:7, 6:8, 2:4, 0:2, 1:3, 2:4)
colnames(df) <- rep(c('a', 'b', 'c'), 3)
> df
a b c a b c a b c
1 0 1 2 5 6 2 0 1 2
2 1 2 3 6 7 3 1 2 3
3 2 3 4 7 8 4 2 3 4
There are multiple columns that have the same name. I would like to rearrange the data frame so that the columns with the same names combine into their own supercolumn, so that there are only unique column names left, for example:
> df
a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 5 6 2
5 6 7 3
6 7 8 4
7 0 1 2
8 1 2 3
9 2 3 4
Any thoughts on how to do this? Thanks in advance!
To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.
Because, usually R dataframes do not allow exact same names (when you create them using data. frame() ). Which means dataframes should not have the same column names. In your example, when you do d3$a R can only show you the first column that have the name a .
The bind_rows() method is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
This will do the trick, I suppose.
Explanation
df[,names(df) == 'a']
will select all columns with name a
unlist
will convert above columns into 1 single vector
unname
will remove some stray rownames given to these vectors.
unique(names(df))
will give you unique column names in df
sapply
will apply the inline function to all values of unique(names(df))
> df
a b c a b c a b c
1 0 1 2 5 6 2 0 1 2
2 1 2 3 6 7 3 1 2 3
3 2 3 4 7 8 4 2 3 4
> sapply(unique(names(df)), function(x) unname(unlist(df[,names(df)==x])))
a b c
[1,] 0 1 2
[2,] 1 2 3
[3,] 2 3 4
[4,] 5 6 2
[5,] 6 7 3
[6,] 7 8 4
[7,] 0 1 2
[8,] 1 2 3
[9,] 2 3 4
My version:
library(reshape)
as.data.frame(with(melt(df), split(value, variable)))
a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 0 1 2
5 1 2 3
6 2 3 4
7 0 1 2
8 1 2 3
9 2 3 4
In the step using melt
I transform the dataset:
> melt(df)
Using as id variables
variable value
1 a 0
2 a 1
3 a 2
4 b 1
5 b 2
6 b 3
7 c 2
8 c 3
9 c 4
10 a 0
11 a 1
12 a 2
13 b 1
14 b 2
15 b 3
16 c 2
17 c 3
18 c 4
19 a 0
20 a 1
21 a 2
22 b 1
23 b 2
24 b 3
25 c 2
26 c 3
27 c 4
Then I split up the value
column for each unique level of variable
using split
:
$a
[1] 0 1 2 0 1 2 0 1 2
$b
[1] 1 2 3 1 2 3 1 2 3
$c
[1] 2 3 4 2 3 4 2 3 4
then this only needs an as.data.frame
to become the data structure you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With