I have a df with numbers: <pre class="prettyprint lang-py prettyprint-override"><code>numbers = pd.DataFrame(columns=['number'], data=[ 50, 65, 75, 85, 90 ]) </code></pre> and a df with ranges (look up table): <pre class="prettyprint lang-py prettyprint-override"><code>ranges = pd.DataFrame( columns=['range','range_min','range_max'], data=[ ['A',90,100], ['B',85,95], ['C',70,80] ] ) </code></pre> I want to determine what range (in second table) a value (in the first table) falls in. Please note ranges overlap, and limits are inclusive. Also please note the vanilla dataframe above has 3 ranges, however this dataframe gets generated dynamically. It could have from 2 to 7 ranges. Desired result: <pre class="prettyprint lang-py prettyprint-override"><code>numbers = pd.DataFrame(columns=['number','detected_range'], data=[ [50,'out_of_range'], [65, 'out_of_range'], [75,'C'], [85,'B'], [90,'overlap'] * could be A or B * ]) </code></pre> I solved this with a for loop but this doesn't scale well to a big dataset I am using. Also code is too extensive and inelegant. See below: <pre class="prettyprint lang-py prettyprint-override"><code>numbers['detected_range'] = nan for i, row1 in number.iterrows(): for j, row2 in ranges.iterrows(): if row1.number<row2.range_min and row1.number>row2.range_max: numbers.loc[i,'detected_range'] = row1.loc[j,'range'] else if (other cases...): ...and so on... </code></pre> How could I do this?

You can use a bit of <code>numpy</code> vectorial operations to generate masks, and use them to <code>select</code> your labels: <pre class="prettyprint"><code>import numpy as np a = numbers['number'].values # numpy array of numbers r = ranges.set_index('range') # dataframe of min/max with labels as index m1 = (a>=r['range_min'].values[:,None]).T # is number above each min m2 = (a<r['range_max'].values[:,None]).T # is number below each max m3 = (m1&m2) # combine both conditions above # NB. the two operations could be done without the intermediate variables m1/m2 m4 = m3.sum(1) # how many matches? # 0 -> out_of_range # 2 -> overlap # 1 -> get column name # now we select the label according to the conditions numbers['detected_range'] = np.select([m4==0, m4==2], # out_of_range and overlap ['out_of_range', 'overlap'], # otherwise get column name default=np.take(r.index, m3.argmax(1)) ) </code></pre> output: <pre class="prettyprint"><code> number detected_range 0 50 out_of_range 1 65 out_of_range 2 75 C 3 85 B 4 90 overlap </code></pre> <h4>edit:</h4> It works with any number of intervals in ranges example output with extra<code>['D',50,51]</code>: <pre class="prettyprint"><code> number detected_range 0 50 D 1 65 out_of_range 2 75 C 3 85 B 4 90 overlap </code></pre>

determine the range of a value using a look up table

Tags:

python

pandas

I have a df with numbers:

numbers = pd.DataFrame(columns=['number'], data=[
50,
65,
75,
85,
90
])

and a df with ranges (look up table):

ranges = pd.DataFrame(
columns=['range','range_min','range_max'],
data=[
['A',90,100],
['B',85,95],
['C',70,80]
]
)

I want to determine what range (in second table) a value (in the first table) falls in. Please note ranges overlap, and limits are inclusive. Also please note the vanilla dataframe above has 3 ranges, however this dataframe gets generated dynamically. It could have from 2 to 7 ranges.

Desired result:

numbers = pd.DataFrame(columns=['number','detected_range'], data=[
[50,'out_of_range'],
[65, 'out_of_range'],
[75,'C'],
[85,'B'],
[90,'overlap'] * could be A or B *
])

I solved this with a for loop but this doesn't scale well to a big dataset I am using. Also code is too extensive and inelegant. See below:

numbers['detected_range'] = nan
for i, row1 in number.iterrows():
    for j, row2 in ranges.iterrows():
        if row1.number<row2.range_min and row1.number>row2.range_max:
             numbers.loc[i,'detected_range'] = row1.loc[j,'range']
        else if (other cases...):
              ...and so on...

How could I do this?

837

asked Oct 06 '21 08:10

Pab

1 Answers

You can use a bit of numpy vectorial operations to generate masks, and use them to select your labels:

import numpy as np

a = numbers['number'].values   # numpy array of numbers
r = ranges.set_index('range')  # dataframe of min/max with labels as index

m1 = (a>=r['range_min'].values[:,None]).T  # is number above each min
m2 = (a<r['range_max'].values[:,None]).T   # is number below each max
m3 = (m1&m2)                               # combine both conditions above
# NB. the two operations could be done without the intermediate variables m1/m2

m4 = m3.sum(1)                             # how many matches?
                                           # 0 -> out_of_range
                                           # 2 -> overlap
                                           # 1 -> get column name

# now we select the label according to the conditions
numbers['detected_range'] = np.select([m4==0, m4==2], # out_of_range and overlap
                                      ['out_of_range', 'overlap'],
                                      # otherwise get column name
                                      default=np.take(r.index, m3.argmax(1))
                                     )

output:

   number detected_range
0      50   out_of_range
1      65   out_of_range
2      75              C
3      85              B
4      90        overlap

edit:

It works with any number of intervals in ranges

example output with extra['D',50,51]:

   number detected_range
0      50              D
1      65   out_of_range
2      75              C
3      85              B
4      90        overlap

answered Oct 18 '22 16:10

mozway

Related questions
                            
                                How to wrap lines automatically in Python Spyder?
                            
                                Python destructor called in the "wrong" order based on reference count
                            
                                Is there a walrus operator in Node.Js?
                            
                                What became available_attrs on Django 3?
                            
                                Django new version 3.1, the settings file have some changes
                            
                                Find missing elements in a list created from a sequence of consecutive integers with duplicates in O(n)
                            
                                Addition between 'int' and custom class [duplicate]
                            
                                Find/replace in VS Code Jupyter Notebooks
                            
                                EmptyDataError: No columns to parse from file about streamlit
                            
                                Is there any way(by setting or extension) to view and use variables in VSCODE other than setting breakpoints. likes Spyder?
                            
                                Print the colname and rowname for values that meet certain condition
                            
                                Elementwise maximum of sparse Scipy matrix & vector with broadcasting
                            
                                SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary'
                            
                                Airflow webserver gettins valueError:Samesite
                            
                                Detect OS dark mode in Python
                            
                                How to check if a model is in train or eval mode in Pytorch?
                            
                                Combine two dictionaries with preference to one of them - [duplicate]
                            
                                If the f-string like string formatting available in Julia?
                            
                                TypeError: Cannot interpret '4' as a data type
                            
                                Using __new__ in inherited dataclasses

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With