Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which of these is pythonic? and Pythonic vs. Speed

I'm new to python and just wrote this module level function:

def _interval(patt):
    """ Converts a string pattern of the form '1y 42d 14h56m'
    to a timedelta object.
    y - years (365 days), M - months (30 days), w - weeks, d - days,
    h - hours, m - minutes, s - seconds"""

    m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt)

    args = {'weeks': 0.0,
            'days': 0.0,
            'hours': 0.0,
            'minutes': 0.0,
            'seconds': 0.0}

    for (n,q) in m:
        if q=='y':
            args['days'] += float(n)*365
        elif q=='M':
            args['days'] += float(n)*30
        elif q=='w':
            args['weeks'] += float(n)
        elif q=='d':
            args['days'] += float(n)
        elif q=='h':
            args['hours'] += float(n)
        elif q=='m':
            args['minutes'] += float(n)
        elif q=='s':
            args['seconds'] += float(n)

    return _dt.timedelta(**args)

My issue is with the for loop here i.e the long if elif block, and was wondering if there is a more pythonic way of doing it.
So I re-wrote the function as:

def _interval2(patt):

    m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt)

    args = {'weeks': 0.0,
            'days': 0.0,
            'hours': 0.0,
            'minutes': 0.0,
            'seconds': 0.0}

    argsmap = {'y': ('days', lambda x: float(x)*365),
               'M': ('days', lambda x: float(x)*30),
               'w': ('weeks', lambda x: float(x)),
               'd': ('days', lambda x: float(x)),
               'h': ('hours', lambda x: float(x)),
               'm': ('minutes', lambda x: float(x)),
               's': ('seconds', lambda x: float(x))}

    for (n,q) in m:
        args[argsmap[q][0]] += argsmap[q][1](n)

    return _dt.timedelta(**args)

I tested the execution times of both the codes using timeit module and found that the second one took about 5-6 seconds longer (for the default number of repeats).

So my question is:
1. Which code is considered more pythonic?
2. Is there still a more pythonic was of writing this function?
3. What about the trade-offs between pythonicity and other aspects (like speed in this case) of programming?

p.s. I kinda have an OCD for elegant code.

EDITED _interval2 after seeing this answer:

argsmap = {'y': ('days', 365),
           'M': ('days', 30),
           'w': ('weeks', 1),
           'd': ('days', 1),
           'h': ('hours', 1),
           'm': ('minutes', 1),
           's': ('seconds', 1)}

for (n,q) in m:
    args[argsmap[q][0]] += float(n)*argsmap[q][1]
like image 719
Kashyap Nadig Avatar asked Jan 17 '11 14:01

Kashyap Nadig


People also ask

What is Pythonic code?

In the Python community, Pythonic is the word describing code that doesn't just get the syntax right but uses the language in the way it is intended to be used. It improves the overall code quality from maintainability, readability and efficiency perspective.

Which is faster or Python?

Python and Java are two of the most popular and robust programming languages. Java is generally faster and more efficient than Python because it is a compiled language. As an interpreted language, Python has simpler, more concise syntax than Java. It can perform the same function as Java in fewer lines of code.

Which is faster C or Python?

C is a faster language compared to Python as it is compiled. Python programs are usually slower than C programs as they are interpreted. In C, the type of the various variables must be declared when they are created, and only values of those particular types must be assigned to them.

Is Python fast in speed?

Despite having these many qualities, python has one drawback, which is it's slow speed. Being an interpreted language, python is slower than other programming languages. Still, we can overcome this problem using some tips.


2 Answers

You seem to create a lot of lambdas every time you parse. You really don't need a lambda, just a multiplier. Try this:

def _factor_for(what):
    if what == 'y': return 365
    elif what == 'M': return 30
    elif what in ('w', 'd', 'h', 's', 'm'): return 1
    else raise ValueError("Invalid specifier %r" % what)

for (n,q) in m:
    args[argsmap[q][0]] += _factor_for([q][1]) * n

Don't make _factor_for a method's local function or a method, though, to speed things up.

like image 165
9000 Avatar answered Oct 23 '22 08:10

9000


(I have not timed this, but) if you're going to use this function often it might be worth pre-compiling the regex expression.

Here's my take on your function:

re_timestr = re.compile("""
    ((?P<years>\d+)y)?\s*
    ((?P<months>\d+)M)?\s*
    ((?P<weeks>\d+)w)?\s*
    ((?P<days>\d+)d)?\s*
    ((?P<hours>\d+)h)?\s*
    ((?P<minutes>\d+)m)?\s*
    ((?P<seconds>\d+)s)?
""", re.VERBOSE)

def interval3(patt):
    p = {}
    match = re_timestr.match(patt)
    if not match: 
        raise ValueError("invalid pattern : %s" % (patt))

    for k,v in match.groupdict("0").iteritems():
        p[k] = int(v) # cast string to int

    p["days"] += p.pop("years")  * 365 # convert years to days
    p["days"] += p.pop("months") * 30  # convert months to days
    return datetime.timedelta(**p)

update

From this question, it looks like precompiling regex patterns does not bring about noticeable performance improvement since Python caches and reuses them anyway. You only save the time it takes to check the cache which, unless you are repeating it numerous times, is negligible.

update2

As you quite rightly pointed out, this solution does not support interval3("1h 30s" + "2h 10m"). However, timedelta supports arithmetic operations which means one you can still express it as interval3("1h 30s") + interval3("2h 10m").

Also, as mentioned by some of the comments on the question, you may want to avoid supporting "years" and "months" in the inputs. There's a reason why timedelta does not support those arguments; it cannot be handled correctly (and incorrect code are almost never elegant).

Here's another version, this time with support for float, negative values, and some error checking.

re_timestr = re.compile("""
    ^\s*
    ((?P<weeks>[+-]?\d+(\.\d*)?)w)?\s*
    ((?P<days>[+-]?\d+(\.\d*)?)d)?\s*
    ((?P<hours>[+-]?\d+(\.\d*)?)h)?\s*
    ((?P<minutes>[+-]?\d+(\.\d*)?)m)?\s*
    ((?P<seconds>[+-]?\d+(\.\d*)?)s)?\s*
    $
""", re.VERBOSE)

def interval4(patt):
    p = {}
    match = re_timestr.match(patt)
    if not match: 
        raise ValueError("invalid pattern : %s" % (patt))

    for k,v in match.groupdict("0").iteritems():
        p[k] = float(v) # cast string to int

    return datetime.timedelta(**p)

Example use cases:

>>> print interval4("1w 2d 3h4m")  # basic use
9 days, 3:04:00

>>> print interval4("1w") - interval4("2d 3h 4m") # timedelta arithmetic 
4 days, 20:56:00

>>> print interval4("0.3w -2.d +1.01h") # +ve and -ve floats 
3:24:36

>>> print interval4("0.3x") # reject invalid input 
Traceback (most recent call last): 
  File "date.py", line 19, in interval4
    raise ValueError("invalid pattern : %s" % (patt)) 
ValueError: invalid pattern : 0.3x

>>> print interval4("1h 2w") # order matters 
Traceback (most recent call last):   
  File "date.py", line 19, in interval4
    raise ValueError("invalid pattern : %s" % (patt)) 
ValueError: invalid pattern : 1h 2w
like image 3
Shawn Chin Avatar answered Oct 23 '22 08:10

Shawn Chin