I have two dataframes. For example
require('xlsx')
csvData <- read.csv("myData.csv")
xlsData <- read.xlsx("myData.xlsx")
csvData looks like this:
Period CPI VIX
1 0.029 31.740
2 0.039 32.840
3 0.028 34.720
4 0.011 43.740
5 -0.003 35.310
6 0.013 26.090
7 0.032 28.420
8 0.022 45.080
xlsData looks like this:
Period CPI DJIA
1 0.029 12176
2 0.039 10646
3 0.028 11407
4 0.011 9563
5 -0.003 10708
6 0.013 10776
7 0.032 9384
8 0.022 7774
When I merge this data, the CPI data is duplicated, and a suffix is put on the header, which is problematic (I have many more columns in my real df's).
mergedData <- merge(xlsData, csvData, by = "Period")
mergedData:
Period CPI.x VIX CPI.y DJIA
1 0.029 31.740 0.029 12176
2 0.039 32.840 0.039 10646
3 0.028 34.720 0.028 11407
4 0.011 43.740 0.011 9563
5 -0.003 35.310 -0.003 10708
6 0.013 26.090 0.013 10776
7 0.032 28.420 0.032 9384
8 0.022 45.080 0.022 7774
I want to merge the data frames without duplicating columns with the same name. For example, I want this kind of output:
Period CPI VIX DJIA
1 0.029 31.740 12176
2 0.039 32.840 10646
3 0.028 34.720 11407
4 0.011 43.740 9563
5 -0.003 35.310 10708
6 0.013 26.090 10776
7 0.032 28.420 9384
8 0.022 45.080 7774
I don't want to have to use additional 'by' arguments, or dropping columns from one of the df's, because there are too many columns that are duplicated in both df's. I'm just looking for a dynamic way to drop those duplicated columns during the merge process.
Thanks!
R base provides duplicated() and unique() functions to remove duplicates in an R DataFrame (data. frame), By using these two functions we can delete duplicate rows by considering all columns, single column, or selected columns.
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join).
The merge() function in R combines two data frames. The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs. The merge() function is similar to the join function in a Relational Database Management System (RDMS).
You can also index your specific column of interest by name. This is useful if you just need a single column/vector from a large data frame.
Period <- seq(1,8)
CPI <- seq(11,18)
VIX <- seq(21,28)
DJIA <- seq(31,38)
Other1 <- paste(letters)[1:8]
Other2 <- paste(letters)[2:9]
Other3 <- paste(letters)[3:10]
df1<- data.frame(Period,CPI,VIX)
df2<- data.frame(Period,CPI,Other1,DJIA,Other2,Other3)
merge(df1,df2[c("Period","DJIA")],by="Period")
> merge(df1,df2[c("Period","DJIA")],by="Period")
Period CPI VIX DJIA
1 1 11 21 31
2 2 12 22 32
3 3 13 23 33
4 4 14 24 34
5 5 15 25 35
6 6 16 26 36
7 7 17 27 37
8 8 18 28 38
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With