Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient algorithm to compute the median of pariwise absolute sums of a sorted array

I'm trying to come up with a fast algorithm to compute the quantity b[i]= med |y_i+y_j|, 1<=j!=i<=n when the y_1,...,y_n are sorted already (so b[] is a vector of same length as y[]). I will assume that all elements of y[] are unique and that n is even.

So, the code below computes the b[i]'s the naive (O(n**2)) way: (I wrote this in R for convenience, but I'm language agnostic)

n<-30
a_fast<-b_slow<-rep(NA,n)
y<-sort(rnorm(n,100,1))
z<-y
for(i in 1:n){
    b_slow[i]<-median(abs(y[-i]+y[i]))
}   

I have a tentative proposal --below-- for doing it in O(n). But it only works if y[] contains positive numbers.

My question is: how should I change the fast algorithm to also work when y[] contains both positive and negative numbers? Is this even possible?

EDIT:

And the code below the (tentative) O(n) way (I wrote this in R for convenience, but I'm language agnostic)

tryA<-floor(1+(n-1)/2+1)
tryB<-floor(1+(n-1)/2)
medA<-y[tryA]
medB<-y[tryB]
for(i in 1:(tryA-1)){
        a_fast[i]<-medA+y[i]
}
for(i in tryA:n){
        a_fast[i]<-medB+y[i]
}

Simple example:

Simple, illustrative example. If we have a vector of length 4

-3, -1, 2, 4

Then, for example for i=1, the 3 absolute pairwise sums are

  4 1 1

and their median is 1.

Then, for example for i=2, the 3 absolute pairwise sums are

  4 1 3

and their median is 3.

Here is a longer example with both positive and negative y[]:

 -1.27 -0.69 -0.56 -0.45 -0.23  0.07  0.13  0.46  1.56  1.72

and here are my new b_slow[] (this is the ground thruth, computed the naive way):

1.20 0.92 1.00 1.01 0.79 0.53 0.56 0.53 1.33 1.49

but now, my new a_fast[] don't match no more:

 -1.20 -0.62 -0.49 -0.38 -0.16 -0.16 -0.10  0.23  1.33  1.49

EDIT:

Here is my implementation of Francis's solution (up to the point where we have two sorted array, the median of which is easy to compute). I did it in R to stay in the spirit of the question.

Nonetheless, I seem to be missing a correction factor for the index (the ww in the code below) so the code below is sometimes off by a little bit. This is because in the definition above we compute the medians over n-1 observations (i!=j).

 n<-100
 y<-rnorm(n)
 y<-sort(y)

 b<-rep(NA,n)
 #Naive --O(n**2)-- approch:
 for(i in 1:n){
     b[i]<-median(abs(y[-i]+y[i]))
 }

 k<-rep(NA,n)
 i<-1
 k[i]<-min(na.omit(c(which(y+y[i]>0)[1],n))) #binary search: O(log(n)) -- 
 for(i in 2:n){                  #O(n)
     k_prov<-k[i-1]
     while(y[k_prov]+y[i]>0 && k_prov>0)     k_prov<-k_prov-1
     k[i]<-max(k_prov+1,1)
     #for(i in 1:n){ should give the same result.
     #   k[i]<-which(y+y[i]>0)[1]
     #}
 }

 i<-sample(1:n,1)
 x1<--y[1:(k[i]-1)]-y[i]
 x2<-y[i]+y[n:k[i]]
 x3<-c(x1,x2)
 plot(x3)
 ww<-ifelse(i<k[i] & i>n/2,n/2+1,n/2)
 sort(x3)[ww]  #this can be computed efficiently: O(log(n))
 b[i]          #this is the O(n**2) result.
like image 463
user189035 Avatar asked May 15 '14 16:05

user189035


People also ask

How do you find the median in a sorted array?

Find the median of the two sorted arrays( The median of the array formed by merging both the arrays). Median: The middle element is found by ordering all elements in sorted order and picking out the one in the middle (or if there are two middle numbers, taking the mean of those two numbers).

How do you find the median of a sorted number?

The median is the middle value in a set of data. First, organize and order the data from smallest to largest. To find the midpoint value, divide the number of observations by two. If there are an odd number of observations, round that number up, and the value in that position is the median.

What is the best time complexity to get median of 2 sorted arrays?

In this basic approach to finding the median of two sorted arrays of different lengths, we have traversed the arrays and counted the first m + n sorted elements of the merged array. As we are traversing only the first m + n elements of the arrays, the time complexity is O(m + n).


1 Answers

Here is a O(Nxln(N)xln(N)) solution :

for all i :

1) find item k such as j<k <=> y[j]+y[i]<0 (dichotomy, O(ln(N)))

k separates two sorted lists : one above -y[i], the other below -y[i], for which the sign should be changed to get abs(y[i]+y[j]). Now, we are looking for the median of these lists.

From here, it is just the problem of finding the median of two sorted lists, repeated n times.

2)Let's pick the maximum (M=abs(y[1]-y[i]) or M=abs(y[size]-y[i])) and minimum (m around k) of these lists and restart a dichotomy (O(ln(N)). Let's start by picking the middle (M+m)/2...at any stage, let pick the middle...

3)Stage of this big dichotomy : How many items y[j]+y[i] are above (M+m)/2 in the first list ? Once again a dichotomy... O(ln(N)). How many items -y[j]-y[i] are above (M+m)/2 in the second list ? Guess what ? Dichotomy... Sum these two numbers. If it is above (size-1)/2, m=(M+m)/2. Otherwise M=(M+m)/2.

4)If m=M stop ! b[i]=m;

I guess somebody will come with a better solution...

Edit : I should thank @user189035 for his link to an O(ln(n+m)) algorithm to compute the median of two sorted lists. How to find the kth smallest element in the union of two sorted arrays?

Bye,

like image 87
francis Avatar answered Oct 12 '22 23:10

francis