Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up pairwise counting of unique observations

I have a vector of objects (object) along with a corresponding vector of time frames (tframe) in which the objects were observed. For each unique pair of objects, I want to calculate the number of time frames in which both objects were observed.

I can write the code using for() loops, but it takes a long time to run as the number of unique objects increases. How might I change the code to speed up the run time?

Below is an example with 4 unique objects (in reality I have about 300). For example, objects a and c were both observed in time frames 1 and 2, so they get a count of 2. Objects b and d were never observed in the same time frame, so they get a count of 0.

object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)

uo <- unique(object)
n <- length(uo)

mpairs <- matrix(NA, nrow=n*(n-1)/2, ncol=3, dimnames=list(NULL, 
  c("obj1", "obj2", "sametf")))

row <- 0
for(i in 1:(n-1)) {
for(j in (i+1):n) {
  row <- row+1
  mpairs[row, "obj1"] <- uo[i]
  mpairs[row, "obj2"] <- uo[j]
  # no. of time frames in which both objects in a pair were observed
  intwin <- intersect(tframe[object==uo[i]], tframe[object==uo[j]])
  mpairs[row, "sametf"] <- length(intwin)
}}

data.frame(object, tframe)
   object tframe
1       a      1
2       a      1
3       a      2
4       b      2
5       b      3
6       c      1
7       c      2
8       c      2
9       c      3
10      d      1

mpairs
     obj1 obj2 sametf
[1,] "a"  "b"  "1"   
[2,] "a"  "c"  "2"   
[3,] "a"  "d"  "1"   
[4,] "b"  "c"  "2"   
[5,] "b"  "d"  "0"   
[6,] "c"  "d"  "1"   
like image 380
Jean V. Adams Avatar asked Jun 07 '16 21:06

Jean V. Adams


People also ask

What are unique observations in statistics?

Unique observations are also often interpreted to mean those that occur precisely once in the data. Thus, if values of a variable are 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, and 5, then in one sense of “unique”, there are five distinct or unique values—namely, 1, 2, 3, 4,...

How do you count unique values in spark?

In that case, we can count the unique values using the approx_count_distinct function (there is also a version that lets you define the maximal approximation error). When we use that function, Spark counts the distinct elements using a variant of the HyperLogLog algorithm.

How can I reduce the number of observations in codebook?

First, be aware that codebook reports their number, albeit as “unique values”. This command may be sufficient for your needs. Alternatively, contract will reduce the dataset to distinct observations and their frequencies.

How do you count unique values in Python with an approximation?

Suppose we don’t need the accurate count, and an approximation is good enough. In that case, we can count the unique values using the approx_count_distinct function (there is also a version that lets you define the maximal approximation error).


1 Answers

You can use crossproduct to get the counts of agreement. You can then reshape the data, if required.

Example

object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)

# This will give you the counts
# Use code from Jean's comment
tab <- tcrossprod(table(object, tframe)>0)

# Reshape the data
tab[lower.tri(tab, TRUE)] <- NA 
reshape2::melt(tab, na.rm=TRUE)
like image 136
user20650 Avatar answered Oct 17 '22 23:10

user20650