Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging a data frame and a lookup table in r, retaining all records from data frame

I have a data frame of 59720 obs. that looks like below. I want to assign a MARKETNAME to each observation from a lookup table.

> data (a)

     DAY  HOUR LEAD Row.Count     DATE    ITIME  HOMEPHONE            CITY  STATE ZIPCODE     ZONENAME
1 Monday 13:00    1      9430 7/1/2013 13:42:51            FORT LAUDERDALE     FL  33315       68
2 Monday 13:00    1      9432 7/1/2013 13:43:50 xxxxx9802x  PLEASANT GROVE     AL  35127       82
3 Monday 13:00    1      9434 7/1/2013 13:46:18 5xxxx85x10      ORO VALLEY     AZ  85737       54
4 Monday  0:00    1      9435 7/1/2013  0:04:34 50xxxx1x364          SPOKANE    WA  99204      211
5 Monday 11:00    1      9436 7/1/2013 11:45:43 951xxxxx20        RIVERSIDE    CA  92507       31
6 Monday 11:00    1      9437 7/1/2013 11:46:26 760xxxxx679            VISTA    CA  92081      539

I have a lookup table of zip codes with 43126 unique zip codes that looks like this:

> data (b)

MARKETNAME            ZIPCODE
NEW YORK              00501
NEW YORK              00544
SPRINGFIELD-HOLYOKE   01001
SPRINGFIELD-HOLYOKE   01002
SPRINGFIELD-HOLYOKE   01003
SPRINGFIELD-HOLYOKE   01004

I wanted to simply assign the MARKETNAME to my dataset "a" comparing the ZIPCODE in "b". So I used

> c <- merge(a, b, by="ZIPCODE") .

It returned 58,972 obs. which meant I lost 748 obs. I did not want to lose any record from a so I changed my code as follows:

> c <- merge (a, b, by = "ZIPCODE" , all.x=TRUE) .

Strangely this returned 61,652 obs. instead of my expectation which was returning 59,720 obs. as per original a data frame with some NAs.

As per the documentation,

"if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output."

My interpretation of this is definitely wrong. Can someone please explain what I am doing wrong and how I can accomplish this simple task?

I referred : How to merge data frames and change element values based on certain conditions?, Subsetting and Merging from 2 Related Data Frames in r, how to merge two unequal size data frame in R but none of them are akin to my problem.

like image 700
vagabond Avatar asked Oct 21 '22 04:10

vagabond


1 Answers

I prefer join from plyr which by default is a left-join returning all matches of records in the first data frame.

c <- join(a, b, by="ZIPCODE")

like image 123
Ricky Avatar answered Oct 24 '22 11:10

Ricky