<p>I am working on a postage application which is required to check an integer postcode against a number of postcode ranges, and return a different code based on which range the postcode matches against.</p> <p>Each code has more than one postcode range. For example, the <strong>M</strong> code should be returned if the postcode is within the ranges 1000-2429, 2545-2575, 2640-2686 or is equal to 2890.</p> <p>I could write this as:</p> <pre class="prettyprint"><code>if 1000 <= postcode <= 2429 or 2545 <= postcode <= 2575 or 2640 <= postcode <= 2686 or postcode == 2890: return 'M' </code></pre> <p>but this seems like a lot of lines of code, given that there are 27 returnable codes and 77 total ranges to check against. Is there a more efficient (and preferably more concise) method of matching an integer to all these ranges using Python?</p> <hr> <p>Edit: There's a lot of excellent solutions flying around, so I have implemented all the ones that I could, and benchmarked their performances.</p> <p>The environment for this program is a web service (Django-powered actually) which needs to check postcode region codes one-by-one, on the fly. My preferred implementation, then, would be one that can be quickly used for each request, and does not need any process to be kept in memory, or needs to process many postcodes in bulk.</p> <p>I tested the following solutions using <code>timeit.Timer</code> with default 1000000 repetitions using randomly generated postcodes each time.</p> <h3>IF solution (my original)</h3> <pre class="prettyprint"><code>if 1000 <= postcode <= 2249 or 2555 <= postcode <= 2574 or ...: return 'M' if 2250 <= postcode <= 2265 or ...: return 'N' ... </code></pre> <p>Time for 1m reps: 5.11 seconds.</p> <h3>Ranges in tuples (Jeff Mercado)</h3> <p>Somewhat more elegant to my mind and certainly easier to enter and read the ranges. Particularly good if they change over time, which is possible. But it did end up four times slower in my implementation.</p> <pre class="prettyprint"><code>if any(lower <= postcode <= upper for (lower, upper) in [(1000, 2249), (2555, 2574), ...]): return 'M' if any(lower <= postcode <= upper for (lower, upper) in [(2250, 2265), ...]): return 'N' ... </code></pre> <p>Time for 1m reps: 19.61 seconds.</p> <h3>Set membership (gnibbler)</h3> <p>As stated by the author, "it's only better if you are building the set once to check against many postcodes in a loop". But I thought I would test it anyway to see.</p> <pre class="prettyprint"><code>if postcode in set(chain(*(xrange(start, end+1) for start, end in ((1000, 2249), (2555, 2574), ...)))): return 'M' if postcode in set(chain(*(xrange(start, end+1) for start, end in ((2250, 2265), ...)))): return 'N' ... </code></pre> <p>Time for 1m reps: 339.35 seconds.</p> <h3>Bisect (robert king)</h3> <p>This one may have been a bit above my intellect level. I learnt a lot reading about the <code>bisect</code> module but just couldn't quite work out which parameters to give <code>find_ge()</code> to make a runnable test. I expect that it would be extremely fast with a loop of many postcodes, but not if it had to do the setup each time. So, with 1m repetitions of filling <code>numbers</code>, <code>edgepairs</code>, <code>edgeanswers</code> etc for just one postal region code (the <strong>M</strong> code with four ranges), but not actually running the <code>fast_solver</code>:</p> <p>Time for 1m reps: 105.61 seconds.</p> <h3>Dict (sentinel)</h3> <p>Using one dict per postal region code pre-generated, cPickled in a source file (106 KB), and loaded for each run. I was expecting much better performance from this method, but on my system at least, the IO really destroyed it. The server is a not-quite-blindingly-fast-top-of-the-line Mac Mini.</p> <p>Time for 1m reps: 5895.18 seconds (extrapolated from a 10,000 run).</p> <h3>The summary</h3> <p>Well, I was expecting someone to just give a simple 'duh' answer that I hadn't considered, but it turns out this is much more complicated (and even a little controversial).</p> <p>If every nanosecond of efficiency counted in this case, I would probably keep a separate process running which implemented one of the binary search or dict solutions and kept the result in memory for an extremely fast lookup. However, since the IF tree takes only five seconds to run a million times, which is plenty fast enough for my small business, that's what I'll end up using in my application.</p> <p>Thank you to everyone for contributing!</p>

<p>You can throw your ranges into tuples and put the tuples in a list. Then use <code>any()</code> to help you find if your value is within these ranges.</p> <pre class="prettyprint"><code>ranges = [(1000,2429), (2545,2575), (2640,2686), (2890, 2890)] if any(lower <= postcode <= upper for (lower, upper) in ranges): print('M') </code></pre>

Python: efficiently check if integer is within many ranges

Q: How do you check if an integers is between two numbers Python?

Python range() to check integer in between two numbers It can quite easily identify if the integer lies between two numbers or not. Here, we've called the range() function which includes the lower range (X) but discards the edge value, i.e., Y.

Tags:

python

I am working on a postage application which is required to check an integer postcode against a number of postcode ranges, and return a different code based on which range the postcode matches against.

Each code has more than one postcode range. For example, the M code should be returned if the postcode is within the ranges 1000-2429, 2545-2575, 2640-2686 or is equal to 2890.

I could write this as:

if 1000 <= postcode <= 2429 or 2545 <= postcode <= 2575 or 2640 <= postcode <= 2686 or postcode == 2890:     return 'M'

but this seems like a lot of lines of code, given that there are 27 returnable codes and 77 total ranges to check against. Is there a more efficient (and preferably more concise) method of matching an integer to all these ranges using Python?

Edit: There's a lot of excellent solutions flying around, so I have implemented all the ones that I could, and benchmarked their performances.

The environment for this program is a web service (Django-powered actually) which needs to check postcode region codes one-by-one, on the fly. My preferred implementation, then, would be one that can be quickly used for each request, and does not need any process to be kept in memory, or needs to process many postcodes in bulk.

I tested the following solutions using timeit.Timer with default 1000000 repetitions using randomly generated postcodes each time.

IF solution (my original)

if 1000 <= postcode <= 2249 or 2555 <= postcode <= 2574 or ...:     return 'M' if 2250 <= postcode <= 2265 or ...:     return 'N' ...

Time for 1m reps: 5.11 seconds.

Ranges in tuples (Jeff Mercado)

Somewhat more elegant to my mind and certainly easier to enter and read the ranges. Particularly good if they change over time, which is possible. But it did end up four times slower in my implementation.

if any(lower <= postcode <= upper for (lower, upper) in [(1000, 2249), (2555, 2574), ...]):     return 'M' if any(lower <= postcode <= upper for (lower, upper) in [(2250, 2265), ...]):     return 'N' ...

Time for 1m reps: 19.61 seconds.

Set membership (gnibbler)

As stated by the author, "it's only better if you are building the set once to check against many postcodes in a loop". But I thought I would test it anyway to see.

if postcode in set(chain(*(xrange(start, end+1) for start, end in ((1000, 2249), (2555, 2574), ...)))):     return 'M' if postcode in set(chain(*(xrange(start, end+1) for start, end in ((2250, 2265), ...)))):     return 'N' ...

Time for 1m reps: 339.35 seconds.

Bisect (robert king)

This one may have been a bit above my intellect level. I learnt a lot reading about the bisect module but just couldn't quite work out which parameters to give find_ge() to make a runnable test. I expect that it would be extremely fast with a loop of many postcodes, but not if it had to do the setup each time. So, with 1m repetitions of filling numbers, edgepairs, edgeanswers etc for just one postal region code (the M code with four ranges), but not actually running the fast_solver:

Time for 1m reps: 105.61 seconds.

Dict (sentinel)

Using one dict per postal region code pre-generated, cPickled in a source file (106 KB), and loaded for each run. I was expecting much better performance from this method, but on my system at least, the IO really destroyed it. The server is a not-quite-blindingly-fast-top-of-the-line Mac Mini.

Time for 1m reps: 5895.18 seconds (extrapolated from a 10,000 run).

The summary

Well, I was expecting someone to just give a simple 'duh' answer that I hadn't considered, but it turns out this is much more complicated (and even a little controversial).

If every nanosecond of efficiency counted in this case, I would probably keep a separate process running which implemented one of the binary search or dict solutions and kept the result in memory for an extremely fast lookup. However, since the IF tree takes only five seconds to run a million times, which is plenty fast enough for my small business, that's what I'll end up using in my application.

Thank you to everyone for contributing!

973

asked May 19 '11 05:05

tobygriffin

1 Answers

You can throw your ranges into tuples and put the tuples in a list. Then use any() to help you find if your value is within these ranges.

ranges = [(1000,2429), (2545,2575), (2640,2686), (2890, 2890)] if any(lower <= postcode <= upper for (lower, upper) in ranges):     print('M')

174

answered Sep 21 '22 19:09

Jeff Mercado

Related questions
                            
                                scikit's GridSearch and Python in general are not freeing memory
                            
                                Emacs Inferior Python shell shows the send message with each python-shell-send-region command
                            
                                AppEngine bulkloader, high replication store and python27 runtime
                            
                                Logistic Regression PMML won't Produce Probabilities
                            
                                Out-of-core processing of sparse CSR arrays
                            
                                How can I define algebraic data types in Python?
                            
                                Python setuptools: how to include a config file for distribution into <prefix>/etc
                            
                                SQLAlchemy: Hybrid expression with relationship
                            
                                Can I write native iPhone, Android, Windows, Blackberry apps using Python? [duplicate]
                            
                                Return results from multiple models with Django REST Framework
                            
                                Why isn't __new__ in Python new-style classes a class method?
                            
                                Plug in django-allauth as endpoint in django-rest-framework
                            
                                Difference between different ways to create celery task
                            
                                Flask App: Update progress bar while function runs
                            
                                Specifying dtype float32 with pandas.read_csv on pandas 0.10.1
                            
                                Is there a Python language specification?
                            
                                Searching for a string in a large text file - profiling various methods in python
                            
                                Has anyone used Sphinx to document a C++ project? [closed]
                            
                                matplotlib legend location numbers
                            
                                Project design / FS layout for large django projects [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: efficiently check if integer is within many ranges

Tags:

python

IF solution (my original)

Ranges in tuples (Jeff Mercado)

Set membership (gnibbler)

Bisect (robert king)

Dict (sentinel)

The summary

tobygriffin

People also ask

1 Answers

Jeff Mercado

Recent Activity

Donate For Us

Python: efficiently check if integer is within *many* ranges

Tags:

python

IF solution (my original)

Ranges in tuples (Jeff Mercado)

Set membership (gnibbler)

Bisect (robert king)

Dict (sentinel)

The summary

tobygriffin

People also ask

1 Answers

Jeff Mercado

Related questions

Recent Activity

Donate For Us

Python: efficiently check if integer is within many ranges