Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merging data and receiving a big loss of data

I've been preparing my data and somehow I have way less data after merging my data sets.

Since I don't have the longitude and latitude in my data I've been using the following code after I downloaded the package zipcode (tel1 is my data containing zipcodes)

merge <- merge(zipcode,tel1,by.x=c('zip'),by.y=c('zip_code'))

Before merging I had 195956 observations, while after merging it dropped down to 180090, but I don't understand why.

In my opinion I just merged them where zip was equal to zip_code and I added the information from the dataset zipcode to my folder tel1

Afterward I wanted to remove the rows that contain NA because the merge couldn't define any numbers or whatever. I used this code

final <- result[complete.cases(result),]

Then my number of observations dropped down to 51006 which I just can't believe. There can't be so many mismatches in my data.

Is there any other code that I should use?

Afterwards I've been trying to delete the duplicates with the code

 last <- with(final,final[order(state,latitude,longitude),])

but the number of observations was consistent (51006).

What did I do wrong or is there a way to get my data into an excel file again after merging the data so I could manually check if there are really so many mismatches? Thanks

like image 235
user1741021 Avatar asked Oct 17 '12 14:10

user1741021


1 Answers

Can use the all argument to merge.

merge(zipcode, tel1, by.x='zip', by.y='zip_code', all.y=TRUE)

However, for rows where matches aren't found in the zipcode data, there will be NAs. Thus if you then na.rm or something to that effect, you will wind up with the same "data loss"

Check the zip codes for the rows where there are NAs in the lat and long columns after the merge:

tel1[is.na(tel1$latitude) | is.na(tel1$longitude),]

My guess is they aren't valid zip codes or the list of zipcodes you have is not complete.

like image 71
Justin Avatar answered Oct 23 '22 18:10

Justin