Is there any efficient way, without using for loops, to duplicate the columns in a data frame? For example, if I have the following data frame:
Var1 Var2
1 1 0
2 2 0
3 1 1
4 2 1
5 1 2
6 2 2
And I specify that column Var1 should be repeated twice, and column Var2 three times, then I would like to get the following:
Var1 Var1 Var2 Var2 Var2
1 1 1 0 0 0
2 2 2 0 0 0
3 1 1 1 1 1
4 2 2 1 1 1
5 1 1 2 2 2
6 2 2 2 2 2
Any help would be greatly appreciated!
Pandas, however, can be tricked into allowing duplicate column names. Duplicate column names are a problem if you plan to transfer your data set to another statistical language. They're also a problem because it will cause unanticipated and sometimes difficult to debug problems in Python.
Pandas DataFrame duplicated() MethodThe duplicated() method returns a Series with True and False values that describe which rows in the DataFrame are duplicated and not. Use the subset parameter to specify if any columns should not be considered when looking for duplicates.
We can replicate the column names (rep
), use that as index to duplicate the columns. By default, the data.frame
columns can have only unique column names, so it will use make.unique
to add .1
, .2
as suffix to the duplicate column names in 'df2'. If we don't want that, we can remove the suffix part with sub
.
df2 <- df1[rep(names(df1), c(2,3))]
names(df2) <- sub('\\..*', '', names(df2))
df2
# Var1 Var1 Var2 Var2 Var2
#1 1 1 0 0 0
#2 2 2 0 0 0
#3 1 1 1 1 1
#4 2 2 1 1 1
#5 1 1 2 2 2
#6 2 2 2 2 2
Or as @Frank mentioned in the comments, we can also do
`[.noquote`(df1,c(1,1,2,2,2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With