pandas DataFrame.groupby with a tolerance

Question

Given the following bit of some data:

data = {'Object': ['objA', 'objB', 'objC', 'objD', 'objE'],
        'Length': [10.1, 10.02, 7.4, 6.24, 5.99]}

df = pd.DataFrame(data)
df

Which results in the following dataframe:

Out[6]:
   Length Object
0   10.10   objA
1   10.02   objB
2    7.40   objC
3    6.24   objD
4    5.99   objE

I'd like to group the 'Length' column based on a +- tolerance. Doing so would give me the following groups. Something like the psuedocode below:

tolerance = .25
grouped = df.groupby(df['Length'] +- tolerance)

Which would result with a grouping similar to the one below:

{(10.10+-.25): [0L, 1L],
 (7.40+-.25):  [2L],
 (6.24+-.25):  [3L, 4L]}

Looking around, folks have suggested using pd.cut and predefining bins, however, given the true size of my dataset and the variability of the lengths, pre-computing the bin ranges seems to be a bit of a brute force solution. Does anyone out there have a more elegant/fast/pandas/numpy-esque solution?

root · Accepted Answer

I'd suggest using the intervaltree package on PyPI, instead of a pandas/numpy-esque solution.

The idea is to add each length +/- tolerance interval to the interval tree, having the interval map to the associated object. Then, iterate over the lengths and query the interval tree. This will give you all of the objects that have a tolerance interval containing the queried length.

from intervaltree import IntervalTree

t = IntervalTree()
for length, obj in zip(data['Length'], data['Object']):
    t[length-tolerance:length+tolerance] = obj

result = {}
for length in data['Length']:
    objs = [iv.data for iv in t[length]]
    result[length] = objs

The result dictionary is as follows:

{10.1: ['objA', 'objB'], 5.99: ['objD', 'objE'], 10.02: ['objA', 'objB'], 6.24: ['objD'], 7.4: ['objC']}

It's not quite in the format you specified, but it should be straightforward enough to make any changes to the format that you need.

pandas DataFrame.groupby with a tolerance

Tags:

python

pandas

numpy

destructo

1 Answers

root

Recent Activity

Donate For Us

pandas DataFrame.groupby with a tolerance

Tags:

python

pandas

numpy

destructo

1 Answers

root

Related questions

Recent Activity

Donate For Us