Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to duplicate columns in a data frame

Tags:

dataframe

r

Is there any efficient way, without using for loops, to duplicate the columns in a data frame? For example, if I have the following data frame:

  Var1 Var2
1    1    0
2    2    0
3    1    1
4    2    1
5    1    2
6    2    2

And I specify that column Var1 should be repeated twice, and column Var2 three times, then I would like to get the following:

  Var1 Var1 Var2 Var2 Var2
1    1    1    0    0    0
2    2    2    0    0    0
3    1    1    1    1    1
4    2    2    1    1    1
5    1    1    2    2    2
6    2    2    2    2    2

Any help would be greatly appreciated!

like image 815
jroberayalas Avatar asked Sep 10 '15 14:09

jroberayalas


People also ask

Can DataFrame have duplicate column names?

Pandas, however, can be tricked into allowing duplicate column names. Duplicate column names are a problem if you plan to transfer your data set to another statistical language. They're also a problem because it will cause unanticipated and sometimes difficult to debug problems in Python.

How do you duplicate in pandas?

Pandas DataFrame duplicated() MethodThe duplicated() method returns a Series with True and False values that describe which rows in the DataFrame are duplicated and not. Use the subset parameter to specify if any columns should not be considered when looking for duplicates.


1 Answers

We can replicate the column names (rep), use that as index to duplicate the columns. By default, the data.frame columns can have only unique column names, so it will use make.unique to add .1, .2 as suffix to the duplicate column names in 'df2'. If we don't want that, we can remove the suffix part with sub.

df2 <- df1[rep(names(df1), c(2,3))]
names(df2) <- sub('\\..*', '', names(df2))
df2
#  Var1 Var1 Var2 Var2 Var2
#1    1    1    0    0    0
#2    2    2    0    0    0
#3    1    1    1    1    1
#4    2    2    1    1    1
#5    1    1    2    2    2
#6    2    2    2    2    2

Or as @Frank mentioned in the comments, we can also do

`[.noquote`(df1,c(1,1,2,2,2))
like image 69
akrun Avatar answered Sep 20 '22 02:09

akrun