I would like to add a prefix to all columns that result from a left join.
left_join()
has the possibility to add a suffix when names are identical between the two tables being joined. But, it does not have an option to always add this suffix even when they are not identically named. And it does not have an option to instead add a prefix.
library(dplyr)
library(nycflights13)
flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)
airports2 <- airports
result <- flights2 %>% left_join(airports2, c("dest" = "faa")) %>% head()
The result:
Source: local data frame [6 x 14]
year month day hour origin dest tailnum carrier name
(int) (int) (int) (dbl) (chr) (chr) (chr) (chr) (chr)
1 2013 1 1 5 EWR IAH N14228 UA George Bush Intercontinental
2 2013 1 1 5 LGA IAH N24211 UA George Bush Intercontinental
3 2013 1 1 5 JFK MIA N619AA AA Miami Intl
4 2013 1 1 5 JFK BQN N804JB B6 NA
5 2013 1 1 5 LGA ATL N668DN DL Hartsfield Jackson Atlanta Intl
6 2013 1 1 5 EWR ORD N39463 UA Chicago Ohare Intl
Variables not shown: lat (dbl), lon (dbl), alt (int), tz (dbl), dst (chr)
Here, it is not possible to know, only from the join result, from which original table each column came.
The purpose of adding this prefix is so that column names be reliably calculated from table names and the column names of data loaded in from a relational database. The database structure loaded into and stored in R
and naming conventions for the relational database will be used, for example, to identify primary and foreign keys. These will then be used to setup the joins and to later retrieve data from the join results.
I've found a similar question for mySQL
, but not for R
:
In a join, how to prefix all column names with the table it came from
A left join is used to join the table by selecting all the records from the first dataframe and only matching records in the second dataframe.
Using merge() to Join Different Column Names Using merge() function from the R base can also be used to perform joining on different column names. To do so you need to create a vector for by. x with the columns you wanted to join on and create a similar vector for by. y .
full_join() return all rows and all columns from both x and y . Where there are not matching values, returns NA for the one missing. return all rows from x where there are matching values in y , keeping just columns from x .
When two tables use the same column name(s), use table_name. column_name or table_alias. column_name format in SELECT clause to differentiate them in the result set. Use INNER JOIN whenever possible because OUTER JOIN uses a lot more system resources and is much more slower.
A straightforward way to achieve this would be to add the prefixes to the original tables before performing the join:
# add prefix before joining:
names(flights2) <- paste0("flights2.", names(flights2) )
names(airports2) <- paste0("airports2.", names(airports2) )
# in join, use names with prefixes
result <- flights2 %>% left_join(airports2, c("flights2.dest" = "airports2.faa") ) %>% head()
The result:
Source: local data frame [6 x 14]
flights2.year flights2.month flights2.day flights2.hour flights2.origin flights2.dest
(int) (int) (int) (dbl) (chr) (chr)
1 2013 1 1 5 EWR IAH
2 2013 1 1 5 LGA IAH
3 2013 1 1 5 JFK MIA
4 2013 1 1 5 JFK BQN
5 2013 1 1 5 LGA ATL
6 2013 1 1 5 EWR ORD
Variables not shown: flights2.tailnum (chr), flights2.carrier (chr), airports2.name (chr),
airports2.lat (dbl), airports2.lon (dbl), airports2.alt (int), airports2.tz (dbl),
airports2.dst (chr)
Now, the joined data frame can be easily referred to in this manner: tableName.columnName
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With