I have a vector of objects (<code>object</code>) along with a corresponding vector of time frames (<code>tframe</code>) in which the objects were observed. For each unique pair of objects, I want to calculate the number of time frames in which both objects were observed. I can write the code using <code>for()</code> loops, but it takes a long time to run as the number of unique objects increases. How might I change the code to speed up the run time? Below is an example with 4 unique objects (in reality I have about 300). For example, objects <code>a</code> and <code>c</code> were both observed in time frames <code>1</code> and <code>2</code>, so they get a count of <code>2</code>. Objects <code>b</code> and <code>d</code> were never observed in the same time frame, so they get a count of <code>0</code>. <pre class="prettyprint"><code>object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d") tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1) uo <- unique(object) n <- length(uo) mpairs <- matrix(NA, nrow=n*(n-1)/2, ncol=3, dimnames=list(NULL, c("obj1", "obj2", "sametf"))) row <- 0 for(i in 1:(n-1)) { for(j in (i+1):n) { row <- row+1 mpairs[row, "obj1"] <- uo[i] mpairs[row, "obj2"] <- uo[j] # no. of time frames in which both objects in a pair were observed intwin <- intersect(tframe[object==uo[i]], tframe[object==uo[j]]) mpairs[row, "sametf"] <- length(intwin) }} data.frame(object, tframe) object tframe 1 a 1 2 a 1 3 a 2 4 b 2 5 b 3 6 c 1 7 c 2 8 c 2 9 c 3 10 d 1 mpairs obj1 obj2 sametf [1,] "a" "b" "1" [2,] "a" "c" "2" [3,] "a" "d" "1" [4,] "b" "c" "2" [5,] "b" "d" "0" [6,] "c" "d" "1" </code></pre>

You can use <code>crossproduct</code> to get the counts of agreement. You can then reshape the data, if required. Example <pre class="prettyprint"><code>object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d") tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1) # This will give you the counts # Use code from Jean's comment tab <- tcrossprod(table(object, tframe)>0) # Reshape the data tab[lower.tri(tab, TRUE)] <- NA reshape2::melt(tab, na.rm=TRUE) </code></pre>

Speed up pairwise counting of unique observations

I have a vector of objects (object) along with a corresponding vector of time frames (tframe) in which the objects were observed. For each unique pair of objects, I want to calculate the number of time frames in which both objects were observed.

I can write the code using for() loops, but it takes a long time to run as the number of unique objects increases. How might I change the code to speed up the run time?

Below is an example with 4 unique objects (in reality I have about 300). For example, objects a and c were both observed in time frames 1 and 2, so they get a count of 2. Objects b and d were never observed in the same time frame, so they get a count of 0.

object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)

uo <- unique(object)
n <- length(uo)

mpairs <- matrix(NA, nrow=n*(n-1)/2, ncol=3, dimnames=list(NULL, 
  c("obj1", "obj2", "sametf")))

row <- 0
for(i in 1:(n-1)) {
for(j in (i+1):n) {
  row <- row+1
  mpairs[row, "obj1"] <- uo[i]
  mpairs[row, "obj2"] <- uo[j]
  # no. of time frames in which both objects in a pair were observed
  intwin <- intersect(tframe[object==uo[i]], tframe[object==uo[j]])
  mpairs[row, "sametf"] <- length(intwin)
}}

data.frame(object, tframe)
   object tframe
1       a      1
2       a      1
3       a      2
4       b      2
5       b      3
6       c      1
7       c      2
8       c      2
9       c      3
10      d      1

mpairs
     obj1 obj2 sametf
[1,] "a"  "b"  "1"   
[2,] "a"  "c"  "2"   
[3,] "a"  "d"  "1"   
[4,] "b"  "c"  "2"   
[5,] "b"  "d"  "0"   
[6,] "c"  "d"  "1"

What are unique observations in statistics?

Unique observations are also often interpreted to mean those that occur precisely once in the data. Thus, if values of a variable are 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, and 5, then in one sense of “unique”, there are five distinct or unique values—namely, 1, 2, 3, 4,...

How do you count unique values in spark?

In that case, we can count the unique values using the approx_count_distinct function (there is also a version that lets you define the maximal approximation error). When we use that function, Spark counts the distinct elements using a variant of the HyperLogLog algorithm.

How can I reduce the number of observations in codebook?

First, be aware that codebook reports their number, albeit as “unique values”. This command may be sufficient for your needs. Alternatively, contract will reduce the dataset to distinct observations and their frequencies.

How do you count unique values in Python with an approximation?

Suppose we don’t need the accurate count, and an approximation is good enough. In that case, we can count the unique values using the approx_count_distinct function (there is also a version that lets you define the maximal approximation error).

You can use crossproduct to get the counts of agreement. You can then reshape the data, if required.

Example

object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)

# This will give you the counts
# Use code from Jean's comment
tab <- tcrossprod(table(object, tframe)>0)

# Reshape the data
tab[lower.tri(tab, TRUE)] <- NA 
reshape2::melt(tab, na.rm=TRUE)

Speed up pairwise counting of unique observations

Tags:

performance

for-loop

r

Jean V. Adams

People also ask

1 Answers

user20650

Recent Activity

Donate For Us

Speed up pairwise counting of unique observations

Tags:

performance

for-loop

r

Jean V. Adams

People also ask

1 Answers

user20650

Related questions

Recent Activity

Donate For Us