Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python code for Earth mover's Distance

Tags:

python

image

I am looking for an Earth Mover's distance(or Fast EMD) implementation in python. Any clues on where to find it, I have looked enough on the web. I want to use it in an image retrieval project that I am doing. Thanks.

EDIT: I found a very nice solution using the pulp libararies. This page also has the instruction required to set it up.

like image 749
vishalv2050 Avatar asked Feb 24 '11 06:02

vishalv2050


People also ask

How is Wasserstein distance calculated?

The Earth Mover's Distance is Wasserstein with p = 1, usually denoted as W1 or 1-Wasserstein. The 2-Wasserstein metric is computed like 1-Wasserstein, except instead of summing the work values, you sum the squared work values and then take the square root.

Is Wasserstein distance a metric?

Abstract Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution.

How do you calculate EMD?

The amount of work done is the amount of dirt moved (the “flow”) times the distance moved. For example, the work required to move the 2 units of dirt in d1 a distance of 1 to hole h1 is 2 * 1 = 2. The EMD is the total amount of work divided by the total flow (amount of dirt).

Why Wasserstein is better than JS or KL divergence?

Also, the Wasserstein metric does not require both measures to be on the same probability space, whereas KL divergence requires both measures to be defined on the same probability space.


2 Answers

here is the python code for calculating EARTH MOVERS DISTANCE between two 1D distributions of equal length

def emd (a,b):
earth = 0
earth1 = 0
diff = 0
s= len(a)
su = []
diff_array = []
for i in range (0,s):
    diff = a[i]-b[i]
    diff_array.append(diff)
    diff = 0
for j in range (0,s):
    earth = (earth + diff_array[j])
    earth1= abs(earth)
    su.append(earth1)
emd_output = sum(su)/(s-1)
print(emd_output)
like image 38
Vikas Avatar answered Oct 09 '22 09:10

Vikas


There is an excellent implementation in OpenCv for Python. The name of the function is CalcEMD2 and a simple code to compare histograms of two images would look like this:

#Import OpenCv library
from cv2 import *

### HISTOGRAM FUNCTION #########################################################
def calcHistogram(src):
    # Convert to HSV
    hsv = cv.CreateImage(cv.GetSize(src), 8, 3)
    cv.CvtColor(src, hsv, cv.CV_BGR2HSV)

    # Extract the H and S planes
    size = cv.GetSize(src)
    h_plane = cv.CreateMat(size[1], size[0], cv.CV_8UC1)
    s_plane = cv.CreateMat(size[1], size[0], cv.CV_8UC1)
    cv.Split(hsv, h_plane, s_plane, None, None)
    planes = [h_plane, s_plane]

    #Define numer of bins
    h_bins = 30
    s_bins = 32

    #Define histogram size
    hist_size = [h_bins, s_bins]

    # hue varies from 0 (~0 deg red) to 180 (~360 deg red again */
    h_ranges = [0, 180]

    # saturation varies from 0 (black-gray-white) to 255 (pure spectrum color)
    s_ranges = [0, 255]

    ranges = [h_ranges, s_ranges]

    #Create histogram
    hist = cv.CreateHist([h_bins, s_bins], cv.CV_HIST_ARRAY, ranges, 1)

    #Calc histogram
    cv.CalcHist([cv.GetImage(i) for i in planes], hist)

    cv.NormalizeHist(hist, 1.0)

    #Return histogram
    return hist

### EARTH MOVERS ############################################################
def calcEM(hist1,hist2,h_bins,s_bins):

    #Define number of rows
    numRows = h_bins*s_bins

    sig1 = cv.CreateMat(numRows, 3, cv.CV_32FC1)
    sig2 = cv.CreateMat(numRows, 3, cv.CV_32FC1)    

    for h in range(h_bins):
        for s in range(s_bins): 
            bin_val = cv.QueryHistValue_2D(hist1, h, s)
            cv.Set2D(sig1, h*s_bins+s, 0, cv.Scalar(bin_val))
            cv.Set2D(sig1, h*s_bins+s, 1, cv.Scalar(h))
            cv.Set2D(sig1, h*s_bins+s, 2, cv.Scalar(s))

            bin_val = cv.QueryHistValue_2D(hist2, h, s)
            cv.Set2D(sig2, h*s_bins+s, 0, cv.Scalar(bin_val))
            cv.Set2D(sig2, h*s_bins+s, 1, cv.Scalar(h))
            cv.Set2D(sig2, h*s_bins+s, 2, cv.Scalar(s))

    #This is the important line were the OpenCV EM algorithm is called
    return cv.CalcEMD2(sig1,sig2,cv.CV_DIST_L2)

### MAIN ########################################################################
if __name__=="__main__":
    #Load image 1
    src1 = cv.LoadImage("image1.jpg")

    #Load image 1
    src2 = cv.LoadImage("image2.jpg")

    # Get histograms
    histSrc1= calcHistogram(src1)
    histSrc2= calcHistogram(src2)

    # Compare histograms using earth mover's
    histComp = calcEM(histSrc1,histSrc2,30,32)

    #Print solution
    print(histComp)

I tested a code very similar to the previous code with Python 2.7 and Python(x,y). If you want to learn more about Earth Mover's and you want to see an implementation using OpenCV and C++, you can read "Chapter 7: Histograms an Matching" of the book "Learning OpenCV" by Gary Bradski & Adrain Kaebler.

like image 62
Jaime Ivan Cervantes Avatar answered Oct 09 '22 07:10

Jaime Ivan Cervantes