Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R merge without duplicating columns

Tags:

merge

dataframe

r

I have two dataframes. For example

require('xlsx')
csvData <- read.csv("myData.csv")
xlsData <- read.xlsx("myData.xlsx")

csvData looks like this:

Period  CPI     VIX
1       0.029   31.740
2       0.039   32.840
3       0.028   34.720
4       0.011   43.740
5       -0.003  35.310
6       0.013   26.090
7       0.032   28.420
8       0.022   45.080

xlsData looks like this:

Period  CPI     DJIA
1       0.029   12176
2       0.039   10646
3       0.028   11407
4       0.011   9563
5       -0.003  10708
6       0.013   10776
7       0.032   9384
8       0.022   7774

When I merge this data, the CPI data is duplicated, and a suffix is put on the header, which is problematic (I have many more columns in my real df's).

mergedData <- merge(xlsData, csvData, by = "Period")

mergedData:

Period  CPI.x   VIX     CPI.y   DJIA
1       0.029   31.740  0.029   12176
2       0.039   32.840  0.039   10646
3       0.028   34.720  0.028   11407
4       0.011   43.740  0.011   9563
5       -0.003  35.310  -0.003  10708
6       0.013   26.090  0.013   10776
7       0.032   28.420  0.032   9384
8       0.022   45.080  0.022   7774

I want to merge the data frames without duplicating columns with the same name. For example, I want this kind of output:

Period  CPI     VIX     DJIA
1       0.029   31.740  12176
2       0.039   32.840  10646
3       0.028   34.720  11407
4       0.011   43.740  9563
5       -0.003  35.310  10708
6       0.013   26.090  10776
7       0.032   28.420  9384
8       0.022   45.080  7774

I don't want to have to use additional 'by' arguments, or dropping columns from one of the df's, because there are too many columns that are duplicated in both df's. I'm just looking for a dynamic way to drop those duplicated columns during the merge process.

Thanks!

like image 314
ch-pub Avatar asked Jun 27 '14 17:06

ch-pub


People also ask

How do I remove duplicates in R?

R base provides duplicated() and unique() functions to remove duplicates in an R DataFrame (data. frame), By using these two functions we can delete duplicate rows by considering all columns, single column, or selected columns.

How do I merge columns of data in R?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.

How do I merge horizontally in R?

To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join).

What is merge () in R?

The merge() function in R combines two data frames. The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs. The merge() function is similar to the join function in a Relational Database Management System (RDMS).


1 Answers

You can also index your specific column of interest by name. This is useful if you just need a single column/vector from a large data frame.

Period <- seq(1,8)
CPI <- seq(11,18)
VIX <- seq(21,28)
DJIA <- seq(31,38)
Other1 <- paste(letters)[1:8]
Other2 <- paste(letters)[2:9]
Other3 <- paste(letters)[3:10]

df1<- data.frame(Period,CPI,VIX)
df2<- data.frame(Period,CPI,Other1,DJIA,Other2,Other3)

merge(df1,df2[c("Period","DJIA")],by="Period") 

> merge(df1,df2[c("Period","DJIA")],by="Period")
  Period CPI VIX DJIA
1      1  11  21   31
2      2  12  22   32
3      3  13  23   33
4      4  14  24   34
5      5  15  25   35
6      6  16  26   36
7      7  17  27   37
8      8  18  28   38
like image 73
Roasty247 Avatar answered Oct 17 '22 07:10

Roasty247