Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error with R dplyr left_join

So I've been trying to use left_join to get the columns of a new dataset onto my main dataset (called employee)

I've double checked the vector names and the cleaning that I've don't and nothing seems to work. Here is my code. Would appreciate any help.

job_codes <- read_csv("Quest_UMMS_JobCodes.csv")
job_codes <- job_codes %>%
clean_names() %>%
select(job_code, pos_desc = pos_des_desc)

job_codes$is_nurse <- str_detect(tolower(job_codes$pos_desc), "nurse") 

employee <- employee %>%
left_join(job_codes, by = "job_code")

The error I keep getting:Error in eval(substitute(expr), envir, enclos) : 'job_code' column not found in rhs, cannot join

here are the results of

names(job_code)
> names(job_codes)
[1] "job_code" "pos_desc" "is_nurse"

names(employee)
> names(employee)
 [1] "REC_NUM"             "ZIP"                 "STATE"              
 [4] "SEX"                 "EEO_CLASS"           "BIRTH_YEAR"         
 [7] "EMP_STATUS"          "PROCESS_LEVEL"       "DEPARTMENT"         
 [10] "JOB_CODE"            "UNION_CODE"          "SUPERVISOR"         
 [13] "DATE_HIRED"          "R_SHIFT"             "SALARY_CLASS"       
 [16] "EXEMPT_EMP"          "PAY_RATE"            "ADJ_HIRE_DATE"      
 [19] "ANNIVERS_DATE"       "TERM_DATE"           "NBR_FTE"            
 [22] "PENSION_PLAN"        "PAY_GRADE"           "SCHEDULE"           
 [25] "OT_PLAN_CODE"        "DECEASED"            "POSITION"           
 [28] "WORK_SCHED"          "SUPERVISOR_IND"      "FTE_TOTAL"          
 [31] "PRO_RATE_TOTAL"      "PRO_RATE_A_SAL"      "NEW_HIRE_DATE"      
 [34] "COUNTY"              "FST_DAY_WORKED"      "date_hired"         
 [37] "date_hired_adj"      "term_date"           "employment_duration"
 [40] "current"             "age"                 "emp_duration_years" 
 [43] "DESCRIPTION.x"       "PAY_STATUS.x"        "DESCRIPTION.y"      
 [46] "PAY_STATUS.y"      
like image 386
Randa Melhem Avatar asked Nov 10 '16 21:11

Randa Melhem


1 Answers

Now, after the OP has added the column names of both tables in the Q, it is evident that the column to join on is written in different ways (upper vs lower case).

If the column names are different, help("left_join") suggests:

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

So, in this case it should read

employee <- employee %>% left_join(job_codes, by = c("JOB_CODE" = "job_code"))
like image 93
Uwe Avatar answered Sep 18 '22 18:09

Uwe