logarithmically spaced integers

Tags:

Say I have a 10,000 pt vector that I want to take a slice of only 100 logarithmically spaced points. I want a function to give me integer values for the indices. Here's a simple solution that is simply using around + logspace, then getting rid of duplicates.

def genLogSpace( array_size, num ):
    lspace = around(logspace(0,log10(array_size),num)).astype(uint64)
    return array(sorted(set(lspace.tolist())))-1

ls=genLogspace(1e4,100)

print ls.size
>>84
print ls
array([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
         11,   13,   14,   15,   17,   19,   21,   23,   25,   27,   30,
         33,   37,   40,   44,   49,   54,   59,   65,   71,   78,   86,
         94,  104,  114,  125,  137,  151,  166,  182,  200,  220,  241,
        265,  291,  319,  350,  384,  422,  463,  508,  558,  613,  672,
        738,  810,  889,  976, 1071, 1176, 1291, 1416, 1555, 1706, 1873,
       2056, 2256, 2476, 2718, 2983, 3274, 3593, 3943, 4328, 4750, 5213,
       5721, 6279, 6892, 7564, 8301, 9111, 9999], dtype=uint64)

Notice that there were 16 duplicates, so now I only have 84 points.

Does anyone have a solution that will efficiently ensure the number of output samples is num? For this specific example, input values for num of 121 and 122 give 100 output points.

914

asked Sep 14 '12 04:09

andy

2 Answers

This is a bit tricky. You can't always get logarithmically spaced numbers. As in your example, first part is rather linear. If you are OK with that, I have a solution. But for the solution, you should understand why you have duplicates.

Logarithmic scale satisfies the condition:

s[n+1]/s[n] = constant

Let's call this constant r for ratio. For n of these numbers between range 1...size, you'll get:

1, r, r**2, r**3, ..., r**(n-1)=size

So this gives you:

r = size ** (1/(n-1))

In your case, n=100 and size=10000, r will be ~1.0974987654930561, which means, if you start with 1, your next number will be 1.0974987654930561 which is then rounded to 1 again. Thus your duplicates. This issue is present for small numbers. After a sufficiently large number, multiplying with ratio will result in a different rounded integer.

Keeping this in mind, your best bet is to add consecutive integers up to a certain point so that this multiplication with the ratio is no longer an issue. Then you can continue with the logarithmic scaling. The following function does that:

import numpy as np

def gen_log_space(limit, n):
    result = [1]
    if n>1:  # just a check to avoid ZeroDivisionError
        ratio = (float(limit)/result[-1]) ** (1.0/(n-len(result)))
    while len(result)<n:
        next_value = result[-1]*ratio
        if next_value - result[-1] >= 1:
            # safe zone. next_value will be a different integer
            result.append(next_value)
        else:
            # problem! same integer. we need to find next_value by artificially incrementing previous value
            result.append(result[-1]+1)
            # recalculate the ratio so that the remaining values will scale correctly
            ratio = (float(limit)/result[-1]) ** (1.0/(n-len(result)))
    # round, re-adjust to 0 indexing (i.e. minus 1) and return np.uint64 array
    return np.array(list(map(lambda x: round(x)-1, result)), dtype=np.uint64)

Python 3 update: Last line used to be return np.array(map(lambda x: round(x)-1, result), dtype=np.uint64) in Python 2

Here are some examples using it:

In [157]: x = gen_log_space(10000, 100)

In [158]: x.size
Out[158]: 100

In [159]: len(set(x))
Out[159]: 100

In [160]: y = gen_log_space(2000, 50)

In [161]: y.size
Out[161]: 50

In [162]: len(set(y))
Out[162]: 50

In [163]: y
Out[163]:
array([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,   11,
         13,   14,   17,   19,   22,   25,   29,   33,   38,   43,   49,
         56,   65,   74,   84,   96,  110,  125,  143,  164,  187,  213,
        243,  277,  316,  361,  412,  470,  536,  612,  698,  796,  908,
       1035, 1181, 1347, 1537, 1753, 1999], dtype=uint64)

And just to show you how logarithmic the results are, here is a semilog plot of the output for x = gen_log_scale(10000, 100) (as you can see, left part is not really logarithmic):

enter image description here

187

answered Sep 20 '22 20:09

Avaris

The approach in Avaris's answer of generating your log-spaced points directly, is definitely the way to go. But I thought it would be interesting to see how to pick the appropriate value to pass to logspace to get what you want.

The values in the array generated by logspace(0, k, n) are the numbers 10^{ik / (n−1)} for 0 ≤ i < n:

>>> numpy.logspace(0, 2, 10)
array([   1.        ,    1.66810054,    2.7825594 ,    4.64158883,
          7.74263683,   12.91549665,   21.5443469 ,   35.93813664,
         59.94842503,  100.        ])
>>> [10 ** (i * 2 / 9.0) for i in xrange(10)]
[1.0, 1.6681005372000588, 2.7825594022071245, 4.641588833612778,
 7.742636826811269, 12.91549665014884, 21.544346900318832,
 35.938136638046274, 59.94842503189409, 100.0]

This sequence consists of an initial segment where the values are more closely than unit spaced (and so there may be duplicates when they are rounded to the nearest integer), followed by a segment where the values are more widely than unit spaced and there are no duplicates.

>>> ' '.join('{:.2f}'.format(10 ** (i * 2 / 19.0)) for i in xrange(20))
'1.00 1.27 1.62 2.07 2.64 3.36 4.28 5.46 6.95 8.86 11.29 14.38 18.33 23.36
 29.76 37.93 48.33 61.58 78.48 100.00'
>>> [int(0.5 + 10 ** (i * 2 / 19.0)) for i in xrange(20)]
[1, 1, 2, 2, 3, 3, 4, 5, 7, 9, 11, 14, 18, 23, 30, 38, 48, 62, 78, 100]

The spacing between values is s(i) = 10^iK − 10^(i−1)K, where K = k / (n − 1). Let m be the smallest value such that s(m) ≥ 1. (m = 7 in the example above.) Then when duplicates are removed, there are exactly ⌊½ + 10^(m−1)K⌋ + n − m remaining numbers.

A bit of algebra finds:

m = ⌈ − log(1 − 10^−K) / K log 10 ⌉

Let's check that.

from math import ceil, floor, log

def logspace_size(k, n):
    """
    Return the number of distinct integers we'll get if we round
    `numpy.logspace(0, k, n)` to the nearest integers and remove
    duplicates.

    >>> logspace_size(4, 100)
    84
    >>> logspace_size(4, 121)
    100
    >>> from numpy import around, logspace
    >>> all(logspace_size(k, n) == len(set(around(logspace(0, k, n))))
    ...     for k in xrange(1,10) for n in xrange(2,100))
    True
    """
    K = float(k) / (n - 1)
    m = int(ceil(- log(1 - 10 ** -K) / (K * log(10))))
    if m < n:
        return int(0.5 + 10 ** ((m - 1) * K)) + n - m
    else:
        return int(0.5 + 10 ** ((n - 1) * K))

The doctests pass, so this looks good to me. So all you need to do is find n such that logspace_size(4, n) == 100. You could do this by binary chop or one of the scipy.optimize methods:

>>> f = lambda x, k, n:(logspace_size(k, x) - n)**2
>>> int(round(scipy.optimize.fmin(f, 100, args=(4,100), xtol=0.5, ftol=0.5)[0]))
Optimization terminated successfully.
         Current function value: 0.015625
         Iterations: 8
         Function evaluations: 17
122

answered Sep 20 '22 20:09

Gareth Rees

Related questions
                            
                                How do I use aggregation operators in a $match in MongoDB (for example $year or $dayOfMonth)?
                            
                                How to make an atom like animation using CSS3 animations?
                            
                                Keeping EC2 Free Instances Free [closed]
                            
                                What are the most important statistics to look at when deploying a Node.js web-application?
                            
                                Reference for Linux API like MSDN? [closed]
                            
                                Array from set: why does NSSet use allObjects, while NSOrderedSet uses array?
                            
                                What is the "const" keyword used for in Dart?
                            
                                PHP for loop vs. foreach with range [closed]
                            
                                Scala - Typeclass example. How to explain the benefits?
                            
                                django DateTimeField with UTC offset?
                            
                                CLOCK_MONOTONIC and pthread_mutex_timedlock / pthread_cond_timedwait
                            
                                UIButton not calling action in iOS 5 but works in iOS 6

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With