Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add ID column by group [duplicate]

Tags:

I want to create an unique ID in R based on two columns of latitude and longitude so that duplicated locations have the same cluster ID.

For example:

LAT        LONG    Cluster_ID 13.5330 -15.4180   1 13.5330 -15.4180   1 13.5330 -15.4180   1 13.5330 -15.4180   1 13.5330 -15.4170   2 13.5330 -15.4170   2 13.5330 -15.4170   2 13.5340 -14.9350   3 13.5340 -14.9350   3 13.5340 -15.9170   4 13.3670 -14.6190   5 
like image 474
jonestats Avatar asked Nov 26 '12 14:11

jonestats


People also ask

Does GROUP BY have duplicates?

GROUP BY only treats two rows as duplicates if all the column values in both the rows are the same. If even a single column value in either of the row is non-matching, they are treated as unique.

How to identify duplicate values in SQL?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.

How to include duplicate values in SQL?

To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.


2 Answers

Here's one way using interaction.

d <- read.table(text='LAT LONG 13.5330 -15.4180  13.5330 -15.4180  13.5330 -15.4180  13.5330 -15.4180  13.5330 -15.4170  13.5330 -15.4170  13.5330 -15.4170  13.5340 -14.9350  13.5340 -14.9350  13.5340 -15.9170  13.3670 -14.6190', header=TRUE)  d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))  #       LAT    LONG Cluster_ID # 1  13.533 -15.418          2 # 2  13.533 -15.418          2 # 3  13.533 -15.418          2 # 4  13.533 -15.418          2 # 5  13.533 -15.417          3 # 6  13.533 -15.417          3 # 7  13.533 -15.417          3 # 8  13.534 -14.935          4 # 9  13.534 -14.935          4 # 10 13.534 -15.917          1 # 11 13.367 -14.619          5 

EDIT: Incorporated @Spacedman's suggestion to supply drop=TRUE to interaction.

like image 181
Matthew Plourde Avatar answered Sep 28 '22 18:09

Matthew Plourde


The data:

dat <- read.table(text=" LAT        LONG 13.5330 -15.4180 13.5330 -15.4180 13.5330 -15.4180 13.5330 -15.4180 13.5330 -15.4170 13.5330 -15.4170 13.5330 -15.4170 13.5340 -14.9350 13.5340 -14.9350 13.5340 -15.9170 13.3670 -14.6190", header = TRUE) 

These commands create an id variable starting with 1:

comb <- with(dat, paste(LAT, LONG)) within(dat, Cluster_ID <- match(comb, unique(comb))) 

The output:

      LAT    LONG Cluster_ID 1  13.533 -15.418          1 2  13.533 -15.418          1 3  13.533 -15.418          1 4  13.533 -15.418          1 5  13.533 -15.417          2 6  13.533 -15.417          2 7  13.533 -15.417          2 8  13.534 -14.935          3 9  13.534 -14.935          3 10 13.534 -15.917          4 11 13.367 -14.619          5 
like image 30
Sven Hohenstein Avatar answered Sep 28 '22 16:09

Sven Hohenstein