Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prefix all columns resulting from left_join() with original table names

I would like to add a prefix to all columns that result from a left join.

left_join() has the possibility to add a suffix when names are identical between the two tables being joined. But, it does not have an option to always add this suffix even when they are not identically named. And it does not have an option to instead add a prefix.

library(dplyr)
library(nycflights13)
flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)
airports2 <- airports

result <- flights2 %>% left_join(airports2, c("dest" = "faa")) %>% head()

The result:

Source: local data frame [6 x 14]

year month   day  hour origin  dest tailnum carrier                            name
(int) (int) (int) (dbl)  (chr) (chr)   (chr)   (chr)                           (chr)
1  2013     1     1     5    EWR   IAH  N14228      UA    George Bush Intercontinental
2  2013     1     1     5    LGA   IAH  N24211      UA    George Bush Intercontinental
3  2013     1     1     5    JFK   MIA  N619AA      AA                      Miami Intl
4  2013     1     1     5    JFK   BQN  N804JB      B6                              NA
5  2013     1     1     5    LGA   ATL  N668DN      DL Hartsfield Jackson Atlanta Intl
6  2013     1     1     5    EWR   ORD  N39463      UA              Chicago Ohare Intl
Variables not shown: lat (dbl), lon (dbl), alt (int), tz (dbl), dst (chr)

Here, it is not possible to know, only from the join result, from which original table each column came.

The purpose of adding this prefix is so that column names be reliably calculated from table names and the column names of data loaded in from a relational database. The database structure loaded into and stored in R and naming conventions for the relational database will be used, for example, to identify primary and foreign keys. These will then be used to setup the joins and to later retrieve data from the join results.

I've found a similar question for mySQL, but not for R:

In a join, how to prefix all column names with the table it came from

like image 329
Bobby Avatar asked Oct 24 '16 20:10

Bobby


People also ask

What does Left_join do in R?

A left join is used to join the table by selecting all the records from the first dataframe and only matching records in the second dataframe.

How left join in R with different column names?

Using merge() to Join Different Column Names Using merge() function from the R base can also be used to perform joining on different column names. To do so you need to create a vector for by. x with the columns you wanted to join on and create a similar vector for by. y .

How does Full_join work in R?

full_join() return all rows and all columns from both x and y . Where there are not matching values, returns NA for the one missing. return all rows from x where there are matching values in y , keeping just columns from x .

How do I join two tables with the same column names?

When two tables use the same column name(s), use table_name. column_name or table_alias. column_name format in SELECT clause to differentiate them in the result set. Use INNER JOIN whenever possible because OUTER JOIN uses a lot more system resources and is much more slower.


1 Answers

A straightforward way to achieve this would be to add the prefixes to the original tables before performing the join:

# add prefix before joining:
names(flights2) <- paste0("flights2.", names(flights2) )
names(airports2) <- paste0("airports2.", names(airports2) )

# in join, use names with prefixes
result <- flights2 %>% left_join(airports2, c("flights2.dest" = "airports2.faa") ) %>% head()

The result:

Source: local data frame [6 x 14]

flights2.year flights2.month flights2.day flights2.hour flights2.origin flights2.dest
(int)          (int)        (int)         (dbl)           (chr)         (chr)
1          2013              1            1             5             EWR           IAH
2          2013              1            1             5             LGA           IAH
3          2013              1            1             5             JFK           MIA
4          2013              1            1             5             JFK           BQN
5          2013              1            1             5             LGA           ATL
6          2013              1            1             5             EWR           ORD
Variables not shown: flights2.tailnum (chr), flights2.carrier (chr), airports2.name (chr),
airports2.lat (dbl), airports2.lon (dbl), airports2.alt (int), airports2.tz (dbl),
airports2.dst (chr)

Now, the joined data frame can be easily referred to in this manner: tableName.columnName

like image 109
Bobby Avatar answered Sep 29 '22 10:09

Bobby