Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quantile/Median/2D binning in Python

do you know a quick/elegant Python/Scipy/Numpy solution for the following problem: You have a set of x, y coordinates with associated values w (all 1D arrays). Now bin x and y onto a 2D grid (size BINSxBINS) and calculate quantiles (like the median) of the w values for each bin, which should at the end result in a BINSxBINS 2D array with the required quantiles.

This is easy to do with some nested loop,but I am sure there is a more elegant solution.

Thanks, Mark

like image 415
Mark Avatar asked Apr 24 '12 21:04

Mark


People also ask

What is quantile binning?

Quantile binning aims to assign the same number of observations to each bin, if the number of observations is evenly divisible by the number of bins. As a result, each bin should have the same number of observations, provided that there are no tied values at the boundaries of the bins.

How do I get quantiles of data in Python?

quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile. For example, numpy. quantile(data, 0.25) returns the value at the first quartile of the dataset data .


2 Answers

This is what I came up with, I hope it's useful. It's not necessarily cleaner or better than using a loop, but maybe it'll get you started toward something better.

import numpy as np
bins_x, bins_y = 1., 1.
x = np.array([1,1,2,2,3,3,3])
y = np.array([1,1,2,2,3,3,3])
w = np.array([1,2,3,4,5,6,7], 'float')

# You can get a bin number for each point like this
x = (x // bins_x).astype('int')
y = (y // bins_y).astype('int')
shape = [x.max()+1, y.max()+1]
bin = np.ravel_multi_index([x, y], shape)

# You could get the mean by doing something like:
mean = np.bincount(bin, w) / np.bincount(bin)

# Median is a bit harder
order = bin.argsort()
bin = bin[order]
w = w[order]
edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1
med_index = (np.r_[0, edges] + np.r_[edges, len(w)]) // 2
median = w[med_index]

# But that's not quite right, so maybe
median2 = [np.median(i) for i in np.split(w, edges)]

Also take a look at numpy.histogram2d

like image 150
Bi Rico Avatar answered Sep 21 '22 15:09

Bi Rico


I'm just trying to do this myself and it sound like you want the command "scipy.stats.binned_statistic_2d" from you can find the mean, median, standard devation or any defined function for the third parameter given the bins.

I realise this question has already been answered but I believe this is a good built in solution.

like image 36
Andrew Griffin Avatar answered Sep 19 '22 15:09

Andrew Griffin